LSHTM_analysis/scripts/ml/log_rpob_7030.txt
2022-06-20 21:55:47 +01:00

19357 lines
944 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_7030.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 176
-------------------------------------------------------------
Successfully split data with stratification: 70/30
Input features data size: (557, 176)
Train data size: (373, 176)
Test data size: (184, 176)
y_train numbers: Counter({0: 189, 1: 184})
y_train ratio: 1.0271739130434783
y_test_numbers: Counter({0: 93, 1: 91})
y_test ratio: 1.021978021978022
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 189, 1: 184}) Data dim: (373, 176)
Simple Random OverSampling
Counter({1: 189, 0: 189})
(378, 176)
Simple Random UnderSampling
Counter({0: 184, 1: 184})
(368, 176)
Simple Combined Over and UnderSampling
Counter({0: 189, 1: 189})
(378, 176)
SMOTE_NC OverSampling
Counter({1: 189, 0: 189})
(378, 176)
#####################################################################
Running ML analysis: 70/30 split
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_7030/
Sanity checks:
Total input features: 176
Training data size: (373, 176)
Test data size: (184, 176)
Target feature numbers (training data): Counter({0: 189, 1: 184})
Target features ratio (training data: 1.0271739130434783
Target feature numbers (test data): Counter({0: 93, 1: 91})
Target features ratio (test data): 1.021978021978022
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.06178975 0.03049135 0.03225613 0.03168249 0.03428817 0.03472066
0.0346334 0.02963018 0.0661819 0.06939864]
mean value: 0.042507266998291014
key: score_time
value: [0.02421689 0.01209807 0.01218534 0.01204538 0.01496863 0.01508284
0.01212668 0.01215744 0.01237869 0.01236558]
mean value: 0.013962554931640624
key: test_mcc
value: [0.89973541 0.57894737 0.68803296 0.73099415 0.83918129 0.68035483
0.83918129 0.89181287 0.94736842 0.84834956]
mean value: 0.7943958140147007
key: train_mcc
value: [0.86265911 0.8687128 0.88086411 0.87498893 0.87500665 0.88101481
0.87500665 0.87500665 0.86910921 0.86324256]
mean value: 0.8725611472917782
key: test_accuracy
value: [0.94736842 0.78947368 0.84210526 0.86486486 0.91891892 0.83783784
0.91891892 0.94594595 0.97297297 0.91891892]
mean value: 0.8957325746799432
key: train_accuracy
value: [0.93134328 0.93432836 0.94029851 0.9375 0.9375 0.94047619
0.9375 0.9375 0.93452381 0.93154762]
mean value: 0.936251776830135
key: test_fscore
value: [0.95 0.78947368 0.83333333 0.86486486 0.91891892 0.84210526
0.91891892 0.94444444 0.97297297 0.90909091]
mean value: 0.8944123309912784
key: train_fscore
value: [0.93009119 0.93373494 0.94011976 0.93655589 0.93693694 0.94011976
0.93693694 0.93693694 0.93413174 0.93134328]
mean value: 0.9356907368285973
key: test_precision
value: [0.9047619 0.78947368 0.88235294 0.88888889 0.89473684 0.8
0.89473684 0.94444444 0.94736842 1. ]
mean value: 0.8946763968745393
key: train_precision
value: [0.93292683 0.92814371 0.92899408 0.93373494 0.93413174 0.93452381
0.93413174 0.93413174 0.92857143 0.92307692]
mean value: 0.9312366935195415
key: test_recall
value: [1. 0.78947368 0.78947368 0.84210526 0.94444444 0.88888889
0.94444444 0.94444444 1. 0.83333333]
mean value: 0.8976608187134503
key: train_recall
value: [0.92727273 0.93939394 0.95151515 0.93939394 0.93975904 0.94578313
0.93975904 0.93975904 0.93975904 0.93975904]
mean value: 0.940215407082877
key: test_roc_auc
value: [0.94736842 0.78947368 0.84210526 0.86549708 0.91959064 0.83918129
0.91959064 0.94590643 0.97368421 0.91666667]
mean value: 0.895906432748538
key: train_roc_auc
value: [0.93128342 0.93440285 0.94046346 0.93753323 0.93752658 0.94053863
0.93752658 0.93752658 0.9345854 0.93164422]
mean value: 0.936303093978315
key: test_jcc
value: [0.9047619 0.65217391 0.71428571 0.76190476 0.85 0.72727273
0.85 0.89473684 0.94736842 0.83333333]
mean value: 0.8135837617759815
key: train_jcc
value: [0.86931818 0.87570621 0.88700565 0.88068182 0.88135593 0.88700565
0.88135593 0.88135593 0.87640449 0.87150838]
mean value: 0.8791698185004754
MCC on Blind test: 0.84
Accuracy on Blind test: 0.92
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.24725127 1.25279379 1.20516229 1.37949181 1.13146496 1.56774902
1.27143836 1.19698691 1.58842921 1.32122326]
mean value: 1.3161990880966186
key: score_time
value: [0.0150485 0.01297903 0.01233244 0.0124383 0.01545596 0.01286793
0.01544952 0.01620698 0.01299453 0.0154202 ]
mean value: 0.014119338989257813
key: test_mcc
value: [0.89973541 0.68803296 0.68803296 0.68035483 0.83918129 0.68035483
0.83918129 0.78764146 0.94736842 0.73020842]
mean value: 0.7780091871199141
key: train_mcc
value: [0.88065448 1. 0.83879937 0.90473153 0.89284196 0.83334517
0.88691246 1. 1. 0.98809355]
mean value: 0.9225378515989221
key: test_accuracy
value: [0.94736842 0.84210526 0.84210526 0.83783784 0.91891892 0.83783784
0.91891892 0.89189189 0.97297297 0.86486486]
mean value: 0.8874822190611664
key: train_accuracy
value: [0.94029851 1. 0.91940299 0.95238095 0.94642857 0.91666667
0.94345238 1. 1. 0.99404762]
mean value: 0.9612677683013504
key: test_fscore
value: [0.95 0.85 0.83333333 0.83333333 0.91891892 0.84210526
0.91891892 0.88235294 0.97297297 0.85714286]
mean value: 0.88590785389547
key: train_fscore
value: [0.93975904 1. 0.918429 0.95151515 0.94578313 0.91515152
0.94294294 1. 1. 0.9939759 ]
mean value: 0.9607556684919915
key: test_precision
value: [0.9047619 0.80952381 0.88235294 0.88235294 0.89473684 0.8
0.89473684 0.9375 0.94736842 0.88235294]
mean value: 0.8835686643078284
key: train_precision
value: [0.93413174 1. 0.91566265 0.95151515 0.94578313 0.92073171
0.94011976 1. 1. 0.9939759 ]
mean value: 0.96019200425852
key: test_recall
value: [1. 0.89473684 0.78947368 0.78947368 0.94444444 0.88888889
0.94444444 0.83333333 1. 0.83333333]
mean value: 0.8918128654970761
key: train_recall
value: [0.94545455 1. 0.92121212 0.95151515 0.94578313 0.90963855
0.94578313 1. 1. 0.9939759 ]
mean value: 0.9613362541073385
key: test_roc_auc
value: [0.94736842 0.84210526 0.84210526 0.83918129 0.91959064 0.83918129
0.91959064 0.89035088 0.97368421 0.86403509]
mean value: 0.887719298245614
key: train_roc_auc
value: [0.94037433 1. 0.91942959 0.95236576 0.94642098 0.91658398
0.9434798 1. 1. 0.99404678]
mean value: 0.9612701222377078
key: test_jcc
value: [0.9047619 0.73913043 0.71428571 0.71428571 0.85 0.72727273
0.85 0.78947368 0.94736842 0.75 ]
mean value: 0.7986578600651827
key: train_jcc
value: [0.88636364 1. 0.84916201 0.90751445 0.89714286 0.84357542
0.89204545 1. 1. 0.98802395]
mean value: 0.9263827781182407
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01527596 0.01070619 0.01155663 0.01507545 0.0173049 0.00961065
0.00960159 0.01027203 0.01683927 0.00966716]
mean value: 0.012590980529785157
key: score_time
value: [0.01228333 0.00963926 0.01033974 0.01431751 0.01128507 0.00904703
0.00902438 0.01120901 0.01366067 0.00890756]
mean value: 0.010971355438232421
key: test_mcc
value: [0.74620251 0.42640143 0.61017022 0.47328975 0.57857577 0.53638795
0.83918129 0.73821295 0.73821295 0.51319869]
mean value: 0.6199833494826468
key: train_mcc
value: [0.63213973 0.67077671 0.66818514 0.67469654 0.63279874 0.65000993
0.65436967 0.65987564 0.64691443 0.63336739]
mean value: 0.6523133932014681
key: test_accuracy
value: [0.86842105 0.71052632 0.78947368 0.72972973 0.78378378 0.75675676
0.91891892 0.86486486 0.86486486 0.75675676]
mean value: 0.8044096728307255
key: train_accuracy
value: [0.81492537 0.83283582 0.83283582 0.83630952 0.80952381 0.82142857
0.82440476 0.82738095 0.82142857 0.81547619]
mean value: 0.8236549395877754
key: test_fscore
value: [0.85714286 0.68571429 0.75 0.70588235 0.75 0.7804878
0.91891892 0.84848485 0.84848485 0.74285714]
mean value: 0.7887973059422126
key: train_fscore
value: [0.80254777 0.81818182 0.82165605 0.82539683 0.78378378 0.80392157
0.80906149 0.81290323 0.80769231 0.80379747]
mean value: 0.8088942308172258
key: test_precision
value: [0.9375 0.75 0.92307692 0.8 0.85714286 0.69565217
0.89473684 0.93333333 0.93333333 0.76470588]
mean value: 0.8489481345257694
key: train_precision
value: [0.84563758 0.88111888 0.86577181 0.86666667 0.89230769 0.87857143
0.87412587 0.875 0.8630137 0.84666667]
mean value: 0.8688880304060501
key: test_recall
value: [0.78947368 0.63157895 0.63157895 0.63157895 0.66666667 0.88888889
0.94444444 0.77777778 0.77777778 0.72222222]
mean value: 0.7461988304093568
key: train_recall
value: [0.76363636 0.76363636 0.78181818 0.78787879 0.69879518 0.74096386
0.75301205 0.75903614 0.75903614 0.76506024]
mean value: 0.7572873311427528
key: test_roc_auc
value: [0.86842105 0.71052632 0.78947368 0.73245614 0.78070175 0.76023392
0.91959064 0.8625731 0.8625731 0.75584795]
mean value: 0.8042397660818713
key: train_roc_auc
value: [0.81417112 0.83181818 0.83208556 0.83545986 0.80822112 0.82048193
0.82356485 0.8265769 0.82069454 0.81488306]
mean value: 0.8227957123550022
key: test_jcc
value: [0.75 0.52173913 0.6 0.54545455 0.6 0.64
0.85 0.73684211 0.73684211 0.59090909]
mean value: 0.6571786977324735
key: train_jcc
value: [0.67021277 0.69230769 0.6972973 0.7027027 0.64444444 0.67213115
0.67934783 0.68478261 0.67741935 0.67195767]
mean value: 0.6792603511829558
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01008868 0.0128963 0.02115774 0.00986743 0.00986052 0.01429033
0.01309276 0.01284504 0.00995731 0.01076698]
mean value: 0.012482309341430664
key: score_time
value: [0.00907111 0.01345086 0.00931144 0.00973558 0.00988913 0.0149653
0.01284695 0.00896931 0.0095892 0.0089035 ]
mean value: 0.010673236846923829
key: test_mcc
value: [0.74620251 0.42640143 0.65465367 0.56725146 0.68035483 0.56725146
0.56725146 0.67849265 0.7888597 0.83918129]
mean value: 0.6515900457032924
key: train_mcc
value: [0.73755882 0.7792393 0.74367201 0.73209888 0.70905196 0.7441844
0.76402212 0.73287373 0.70870914 0.71482244]
mean value: 0.7366232811135077
key: test_accuracy
value: [0.86842105 0.71052632 0.81578947 0.78378378 0.83783784 0.78378378
0.78378378 0.83783784 0.89189189 0.91891892]
mean value: 0.8232574679943101
key: train_accuracy
value: [0.86865672 0.88955224 0.87164179 0.86607143 0.85416667 0.87202381
0.88095238 0.86607143 0.85416667 0.85714286]
mean value: 0.8680445984363895
key: test_fscore
value: [0.87804878 0.73170732 0.78787879 0.78947368 0.84210526 0.77777778
0.77777778 0.82352941 0.89473684 0.91891892]
mean value: 0.8221954561152628
key: train_fscore
value: [0.86826347 0.88888889 0.87164179 0.86404834 0.85545723 0.87164179
0.88372093 0.86725664 0.85459941 0.85798817]
mean value: 0.868350664914892
key: test_precision
value: [0.81818182 0.68181818 0.92857143 0.78947368 0.8 0.77777778
0.77777778 0.875 0.85 0.89473684]
mean value: 0.8193337510442774
key: train_precision
value: [0.85798817 0.88095238 0.85882353 0.86144578 0.83815029 0.86390533
0.85393258 0.84971098 0.84210526 0.84302326]
mean value: 0.8550037559538748
key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 0.88888889 0.77777778
0.77777778 0.77777778 0.94444444 0.94444444]
mean value: 0.8321637426900584
key: train_recall
value: [0.87878788 0.8969697 0.88484848 0.86666667 0.87349398 0.87951807
0.91566265 0.88554217 0.86746988 0.87349398]
mean value: 0.8822453450164294
key: test_roc_auc
value: [0.86842105 0.71052632 0.81578947 0.78362573 0.83918129 0.78362573
0.78362573 0.83625731 0.89327485 0.91959064]
mean value: 0.8233918128654971
key: train_roc_auc
value: [0.8688057 0.88966132 0.87183601 0.86608187 0.85439405 0.87211198
0.88136074 0.8663005 0.85432318 0.85733522]
mean value: 0.8682210557211489
key: test_jcc
value: [0.7826087 0.57692308 0.65 0.65217391 0.72727273 0.63636364
0.63636364 0.7 0.80952381 0.85 ]
mean value: 0.7021229495142538
key: train_jcc
value: [0.76719577 0.8 0.77248677 0.7606383 0.74742268 0.77248677
0.79166667 0.765625 0.74611399 0.75129534]
mean value: 0.7674931283545561
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00998735 0.01429224 0.01066971 0.01056051 0.01671553 0.00953698
0.01070189 0.0094161 0.0092299 0.01051044]
mean value: 0.011162066459655761
key: score_time
value: [0.06381631 0.04476571 0.01716495 0.01765943 0.0184772 0.0145154
0.01202798 0.0113976 0.01206398 0.01199079]
mean value: 0.022387933731079102
key: test_mcc
value: [0.42163702 0.15877684 0.63960215 0.24633537 0.7888597 0.08554907
0.4163404 0.30307132 0.45906433 0.62170355]
mean value: 0.41409397538354425
key: train_mcc
value: [0.64781471 0.70758921 0.65960709 0.67249172 0.63089248 0.68489413
0.65480084 0.69054046 0.66065385 0.65480084]
mean value: 0.6664085336282927
key: test_accuracy
value: [0.71052632 0.57894737 0.81578947 0.62162162 0.89189189 0.54054054
0.7027027 0.64864865 0.72972973 0.81081081]
mean value: 0.7051209103840683
key: train_accuracy
value: [0.8238806 0.85373134 0.82985075 0.83630952 0.81547619 0.8422619
0.82738095 0.8452381 0.83035714 0.82738095]
mean value: 0.8331867448471926
key: test_fscore
value: [0.7027027 0.6 0.8 0.61111111 0.89473684 0.56410256
0.64516129 0.58064516 0.72222222 0.8 ]
mean value: 0.6920681893856767
key: train_fscore
value: [0.81846154 0.85285285 0.82674772 0.83282675 0.81212121 0.84272997
0.82317073 0.84146341 0.82779456 0.82317073]
mean value: 0.8301339481829435
key: test_precision
value: [0.72222222 0.57142857 0.875 0.64705882 0.85 0.52380952
0.76923077 0.69230769 0.72222222 0.82352941]
mean value: 0.7196809236515119
key: train_precision
value: [0.83125 0.8452381 0.82926829 0.83536585 0.81707317 0.83040936
0.83333333 0.85185185 0.83030303 0.83333333]
mean value: 0.8337426317857961
key: test_recall
value: [0.68421053 0.63157895 0.73684211 0.57894737 0.94444444 0.61111111
0.55555556 0.5 0.72222222 0.77777778]
mean value: 0.6742690058479532
key: train_recall
value: [0.80606061 0.86060606 0.82424242 0.83030303 0.80722892 0.85542169
0.81325301 0.8313253 0.8253012 0.81325301]
mean value: 0.8266995253742242
key: test_roc_auc
value: [0.71052632 0.57894737 0.81578947 0.62280702 0.89327485 0.54239766
0.69883041 0.64473684 0.72953216 0.80994152]
mean value: 0.7046783625730995
key: train_roc_auc
value: [0.82361854 0.85383244 0.82976827 0.83620415 0.81537916 0.84241673
0.82721474 0.84507442 0.83029766 0.82721474]
mean value: 0.8331020846685362
key: test_jcc
value: [0.54166667 0.42857143 0.66666667 0.44 0.80952381 0.39285714
0.47619048 0.40909091 0.56521739 0.66666667]
mean value: 0.5396451157538114
key: train_jcc
value: [0.69270833 0.7434555 0.70466321 0.71354167 0.68367347 0.72820513
0.69948187 0.72631579 0.70618557 0.69948187]
mean value: 0.7097712394464257
MCC on Blind test: 0.49
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01723957 0.01586294 0.01570201 0.01627302 0.01574993 0.0172441
0.01658297 0.01831293 0.01824284 0.01881599]
mean value: 0.017002630233764648
key: score_time
value: [0.0108459 0.01086402 0.010607 0.01053047 0.01058245 0.01136756
0.01049376 0.01160789 0.01160383 0.01154995]
mean value: 0.011005282402038574
key: test_mcc
value: [0.89473684 0.47633051 0.63960215 0.7888597 0.84959079 0.68035483
0.78362573 0.83871328 1. 0.83871328]
mean value: 0.7790527120618065
key: train_mcc
value: [0.79118098 0.83279857 0.81532977 0.80977356 0.78568391 0.80949681
0.80371348 0.8097803 0.79787385 0.77976011]
mean value: 0.8035391354542314
key: test_accuracy
value: [0.94736842 0.73684211 0.81578947 0.89189189 0.91891892 0.83783784
0.89189189 0.91891892 1. 0.91891892]
mean value: 0.8878378378378379
key: train_accuracy
value: [0.89552239 0.91641791 0.90746269 0.9047619 0.89285714 0.9047619
0.90178571 0.9047619 0.89880952 0.88988095]
mean value: 0.9017022032693675
key: test_fscore
value: [0.94736842 0.75 0.8 0.88888889 0.92307692 0.84210526
0.88888889 0.91428571 1. 0.91428571]
mean value: 0.8868899813636656
key: train_fscore
value: [0.89489489 0.91515152 0.90746269 0.90419162 0.89156627 0.90361446
0.90149254 0.9047619 0.89880952 0.88888889]
mean value: 0.9010834291045358
key: test_precision
value: [0.94736842 0.71428571 0.875 0.94117647 0.85714286 0.8
0.88888889 0.94117647 1. 0.94117647]
mean value: 0.8906215293134798
key: train_precision
value: [0.88690476 0.91515152 0.89411765 0.89349112 0.89156627 0.90361446
0.89349112 0.89411765 0.88823529 0.88622754]
mean value: 0.8946917381614027
key: test_recall
value: [0.94736842 0.78947368 0.73684211 0.84210526 1. 0.88888889
0.88888889 0.88888889 1. 0.88888889]
mean value: 0.8871345029239766
key: train_recall
value: [0.9030303 0.91515152 0.92121212 0.91515152 0.89156627 0.90361446
0.90963855 0.91566265 0.90963855 0.89156627]
mean value: 0.9076232201533406
key: test_roc_auc
value: [0.94736842 0.73684211 0.81578947 0.89327485 0.92105263 0.83918129
0.89181287 0.91812865 1. 0.91812865]
mean value: 0.8881578947368421
key: train_roc_auc
value: [0.8956328 0.91639929 0.90766488 0.90494418 0.89284196 0.90474841
0.9018781 0.90489015 0.89893692 0.88990078]
mean value: 0.9017837462995806
key: test_jcc
value: [0.9 0.6 0.66666667 0.8 0.85714286 0.72727273
0.8 0.84210526 1. 0.84210526]
mean value: 0.8035292777398041
key: train_jcc
value: [0.80978261 0.84357542 0.83060109 0.82513661 0.80434783 0.82417582
0.82065217 0.82608696 0.81621622 0.8 ]
mean value: 0.8200574729521878
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.2372601 2.04340816 1.6573236 1.768543 1.91152811 1.71026349
1.82718539 2.05708623 2.5108068 2.36679077]
mean value: 2.0090195655822756
key: score_time
value: [0.01652408 0.01711798 0.01492739 0.02228522 0.02000928 0.01484942
0.02607751 0.01266646 0.02013946 0.01371574]
mean value: 0.017831254005432128
key: test_mcc
value: [0.89973541 0.58218174 0.63960215 0.74044197 0.7888597 0.6754386
0.80369958 0.78362573 0.94736842 0.73020842]
mean value: 0.7591161713668702
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.78947368 0.81578947 0.86486486 0.89189189 0.83783784
0.89189189 0.89189189 0.97297297 0.86486486]
mean value: 0.8768847795163585
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94444444 0.8 0.8 0.85714286 0.89473684 0.83333333
0.9 0.88888889 0.97297297 0.85714286]
mean value: 0.8748662196030617
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.76190476 0.875 0.9375 0.85 0.83333333
0.81818182 0.88888889 0.94736842 0.88235294]
mean value: 0.8794530164537905
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89473684 0.84210526 0.73684211 0.78947368 0.94444444 0.83333333
1. 0.88888889 1. 0.83333333]
mean value: 0.8763157894736842
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.78947368 0.81578947 0.86695906 0.89327485 0.8377193
0.89473684 0.89181287 0.97368421 0.86403509]
mean value: 0.8774853801169591
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89473684 0.66666667 0.66666667 0.75 0.80952381 0.71428571
0.81818182 0.8 0.94736842 0.75 ]
mean value: 0.781742993848257
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02188754 0.01587939 0.01625609 0.01523471 0.01606822 0.01897764
0.02061105 0.01994038 0.01674199 0.01598382]
mean value: 0.01775808334350586
key: score_time
value: [0.01280594 0.00951886 0.00900126 0.00924301 0.01010299 0.01299381
0.01148295 0.0133841 0.0088644 0.00901771]
mean value: 0.01064150333404541
key: test_mcc
value: [0.9486833 0.79388419 0.89973541 0.83918129 0.7888597 0.83918129
0.94736842 0.89181287 1. 0.83918129]
mean value: 0.8787887738162246
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.89473684 0.94736842 0.91891892 0.89189189 0.91891892
0.97297297 0.94594595 1. 0.91891892]
mean value: 0.9383357041251779
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.88888889 0.94444444 0.91891892 0.89473684 0.91891892
0.97297297 0.94444444 1. 0.91891892]
mean value: 0.9376603323971745
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94117647 1. 0.94444444 0.85 0.89473684
0.94736842 0.94444444 1. 0.89473684]
mean value: 0.9366907464740282
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.84210526 0.89473684 0.89473684 0.94444444 0.94444444
1. 0.94444444 1. 0.94444444]
mean value: 0.9409356725146198
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.89473684 0.94736842 0.91959064 0.89327485 0.91959064
0.97368421 0.94590643 1. 0.91959064]
mean value: 0.9387426900584795
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.8 0.89473684 0.85 0.80952381 0.85
0.94736842 0.89473684 1. 0.85 ]
mean value: 0.8846365914786968
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10830688 0.10529113 0.10499811 0.10781646 0.10907078 0.10594296
0.10623217 0.10774302 0.12120008 0.11080527]
mean value: 0.10874068737030029
key: score_time
value: [0.01754594 0.01745701 0.01749945 0.01769519 0.01761103 0.01773334
0.01761365 0.01932955 0.01826262 0.0196104 ]
mean value: 0.01803581714630127
key: test_mcc
value: [1. 0.52704628 0.63960215 0.7888597 0.7888597 0.56725146
0.6754386 0.83918129 1. 0.84834956]
mean value: 0.7674588721170138
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.76315789 0.81578947 0.89189189 0.89189189 0.78378378
0.83783784 0.91891892 1. 0.91891892]
mean value: 0.8822190611664296
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.76923077 0.8 0.88888889 0.89473684 0.77777778
0.83333333 0.91891892 1. 0.90909091]
mean value: 0.879197743934586
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 0.875 0.94117647 0.85 0.77777778
0.83333333 0.89473684 1. 1. ]
mean value: 0.892202442380461
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.78947368 0.73684211 0.84210526 0.94444444 0.77777778
0.83333333 0.94444444 1. 0.83333333]
mean value: 0.8701754385964913
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.76315789 0.81578947 0.89327485 0.89327485 0.78362573
0.8377193 0.91959064 1. 0.91666667]
mean value: 0.8823099415204678
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.625 0.66666667 0.8 0.80952381 0.63636364
0.71428571 0.85 1. 0.83333333]
mean value: 0.793517316017316
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01310754 0.01101828 0.01016736 0.01003575 0.01700163 0.01136065
0.01075435 0.01067233 0.01063561 0.0164597 ]
mean value: 0.012121319770812988
key: score_time
value: [0.00934529 0.00985003 0.00910783 0.01338744 0.01081181 0.0099082
0.00986314 0.00971031 0.01279759 0.01062942]
mean value: 0.010541105270385742
key: test_mcc
value: [0.73786479 0.32732684 0.42163702 0.56725146 0.41299552 0.24269006
0.35087719 0.73099415 0.35104619 0.52214434]
mean value: 0.466482756739817
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86842105 0.65789474 0.71052632 0.78378378 0.7027027 0.62162162
0.67567568 0.86486486 0.67567568 0.75675676]
mean value: 0.731792318634424
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.87179487 0.69767442 0.71794872 0.78947368 0.71794872 0.61111111
0.66666667 0.86486486 0.64705882 0.76923077]
mean value: 0.7353772645910309
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85 0.625 0.7 0.78947368 0.66666667 0.61111111
0.66666667 0.84210526 0.6875 0.71428571]
mean value: 0.715280910609858
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89473684 0.78947368 0.73684211 0.78947368 0.77777778 0.61111111
0.66666667 0.88888889 0.61111111 0.83333333]
mean value: 0.7599415204678363
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86842105 0.65789474 0.71052632 0.78362573 0.70467836 0.62134503
0.6754386 0.86549708 0.67397661 0.75877193]
mean value: 0.7320175438596491
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.77272727 0.53571429 0.56 0.65217391 0.56 0.44
0.5 0.76190476 0.47826087 0.625 ]
mean value: 0.5885781102955016
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.37
Accuracy on Blind test: 0.68
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.74289036 1.53044105 1.49827647 1.53210521 1.52824092 1.50557423
1.51571012 1.51636791 1.51605439 1.5078156 ]
mean value: 1.5393476247787476
key: score_time
value: [0.09816289 0.09600377 0.09313393 0.09704614 0.09628916 0.09311295
0.09352589 0.09830904 0.09392619 0.09237742]
mean value: 0.09518873691558838
key: test_mcc
value: [0.9486833 0.79388419 0.80757285 0.94736842 0.89736456 0.89181287
0.94736842 0.83918129 1. 1. ]
mean value: 0.9073235893751431
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.89473684 0.89473684 0.97297297 0.94594595 0.94594595
0.97297297 0.91891892 1. 1. ]
mean value: 0.95199146514936
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.88888889 0.88235294 0.97297297 0.94736842 0.94444444
0.97297297 0.91891892 1. 1. ]
mean value: 0.9502278534786275
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94117647 1. 1. 0.9 0.94444444
0.94736842 0.89473684 1. 1. ]
mean value: 0.9577726178190574
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.84210526 0.78947368 0.94736842 1. 0.94444444
1. 0.94444444 1. 1. ]
mean value: 0.9467836257309942
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.89473684 0.89473684 0.97368421 0.94736842 0.94590643
0.97368421 0.91959064 1. 1. ]
mean value: 0.9523391812865497
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.8 0.78947368 0.94736842 0.9 0.89473684
0.94736842 0.85 1. 1. ]
mean value: 0.9078947368421053
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.79304814 0.91387272 1.02754641 0.95623207 0.91034627 0.89128447
0.93726492 0.88634753 0.91595483 0.94292164]
mean value: 1.0174818992614747
key: score_time
value: [0.26749158 0.23623896 0.25596142 0.2053082 0.22877908 0.22374535
0.21946526 0.25390697 0.22047591 0.22399354]
mean value: 0.23353662490844726
key: test_mcc
value: [1. 0.68803296 0.76376262 0.84959079 0.89736456 0.7888597
0.89181287 0.83918129 1. 0.94721815]
mean value: 0.8665822928273674
key: train_mcc
value: [0.94639427 0.97016256 0.96423353 0.96427432 0.96434396 0.95243498
0.95243498 0.96428065 0.94656062 0.95834146]
mean value: 0.9583461336790984
key: test_accuracy
value: [1. 0.84210526 0.86842105 0.91891892 0.94594595 0.89189189
0.94594595 0.91891892 1. 0.97297297]
mean value: 0.9305120910384068
key: train_accuracy
value: [0.97313433 0.98507463 0.98208955 0.98214286 0.98214286 0.97619048
0.97619048 0.98214286 0.97321429 0.97916667]
mean value: 0.9791488983653163
key: test_fscore
value: [1. 0.83333333 0.84848485 0.91428571 0.94736842 0.89473684
0.94444444 0.91891892 1. 0.97142857]
mean value: 0.9273001094053726
key: train_fscore
value: [0.97247706 0.98489426 0.98170732 0.98181818 0.98181818 0.97575758
0.97575758 0.98192771 0.97264438 0.97885196]
mean value: 0.9787654207752894
key: test_precision
value: [1. 0.88235294 1. 1. 0.9 0.85
0.94444444 0.89473684 1. 1. ]
mean value: 0.9471534227726178
key: train_precision
value: [0.98148148 0.98192771 0.98773006 0.98181818 0.98780488 0.98170732
0.98170732 0.98192771 0.98159509 0.98181818]
mean value: 0.9829517932373947
key: test_recall
value: [1. 0.78947368 0.73684211 0.84210526 1. 0.94444444
0.94444444 0.94444444 1. 0.94444444]
mean value: 0.9146198830409357
key: train_recall
value: [0.96363636 0.98787879 0.97575758 0.98181818 0.97590361 0.96987952
0.96987952 0.98192771 0.96385542 0.97590361]
mean value: 0.974644030668127
key: test_roc_auc
value: [1. 0.84210526 0.86842105 0.92105263 0.94736842 0.89327485
0.94590643 0.91959064 1. 0.97222222]
mean value: 0.9309941520467836
key: train_roc_auc
value: [0.97299465 0.98511586 0.98199643 0.98213716 0.98206945 0.97611623
0.97611623 0.98214033 0.97310418 0.97912828]
mean value: 0.9790918811751368
key: test_jcc
value: [1. 0.71428571 0.73684211 0.84210526 0.9 0.80952381
0.89473684 0.85 1. 0.94444444]
mean value: 0.8691938178780284
key: train_jcc
value: [0.94642857 0.9702381 0.96407186 0.96428571 0.96428571 0.95266272
0.95266272 0.96449704 0.94674556 0.95857988]
mean value: 0.9584457880519603
MCC on Blind test: 0.85
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02368402 0.00988269 0.00996494 0.01014757 0.00983524 0.0103724
0.00982785 0.00976944 0.010041 0.01028967]
mean value: 0.01138148307800293
key: score_time
value: [0.01074362 0.00889611 0.06786227 0.00892687 0.00902653 0.00950575
0.00921893 0.00906777 0.00902295 0.00916648]
mean value: 0.015143728256225586
key: test_mcc
value: [0.74620251 0.42640143 0.65465367 0.56725146 0.68035483 0.56725146
0.56725146 0.67849265 0.7888597 0.83918129]
mean value: 0.6515900457032924
key: train_mcc
value: [0.73755882 0.7792393 0.74367201 0.73209888 0.70905196 0.7441844
0.76402212 0.73287373 0.70870914 0.71482244]
mean value: 0.7366232811135077
key: test_accuracy
value: [0.86842105 0.71052632 0.81578947 0.78378378 0.83783784 0.78378378
0.78378378 0.83783784 0.89189189 0.91891892]
mean value: 0.8232574679943101
key: train_accuracy
value: [0.86865672 0.88955224 0.87164179 0.86607143 0.85416667 0.87202381
0.88095238 0.86607143 0.85416667 0.85714286]
mean value: 0.8680445984363895
key: test_fscore
value: [0.87804878 0.73170732 0.78787879 0.78947368 0.84210526 0.77777778
0.77777778 0.82352941 0.89473684 0.91891892]
mean value: 0.8221954561152628
key: train_fscore
value: [0.86826347 0.88888889 0.87164179 0.86404834 0.85545723 0.87164179
0.88372093 0.86725664 0.85459941 0.85798817]
mean value: 0.868350664914892
key: test_precision
value: [0.81818182 0.68181818 0.92857143 0.78947368 0.8 0.77777778
0.77777778 0.875 0.85 0.89473684]
mean value: 0.8193337510442774
key: train_precision
value: [0.85798817 0.88095238 0.85882353 0.86144578 0.83815029 0.86390533
0.85393258 0.84971098 0.84210526 0.84302326]
mean value: 0.8550037559538748
key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 0.88888889 0.77777778
0.77777778 0.77777778 0.94444444 0.94444444]
mean value: 0.8321637426900584
key: train_recall
value: [0.87878788 0.8969697 0.88484848 0.86666667 0.87349398 0.87951807
0.91566265 0.88554217 0.86746988 0.87349398]
mean value: 0.8822453450164294
key: test_roc_auc
value: [0.86842105 0.71052632 0.81578947 0.78362573 0.83918129 0.78362573
0.78362573 0.83625731 0.89327485 0.91959064]
mean value: 0.8233918128654971
key: train_roc_auc
value: [0.8688057 0.88966132 0.87183601 0.86608187 0.85439405 0.87211198
0.88136074 0.8663005 0.85432318 0.85733522]
mean value: 0.8682210557211489
key: test_jcc
value: [0.7826087 0.57692308 0.65 0.65217391 0.72727273 0.63636364
0.63636364 0.7 0.80952381 0.85 ]
mean value: 0.7021229495142538
key: train_jcc
value: [0.76719577 0.8 0.77248677 0.7606383 0.74742268 0.77248677
0.79166667 0.765625 0.74611399 0.75129534]
mean value: 0.7674931283545561
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.24965024 0.05168438 0.06280875 0.06240273 0.07617354 0.05706716
0.05821943 0.06454206 0.06267881 0.09799004]
mean value: 0.0843217134475708
key: score_time
value: [0.01142573 0.01089931 0.01093245 0.01113415 0.01153183 0.01123571
0.01077247 0.01092672 0.01053596 0.01156497]
mean value: 0.011095929145812988
key: test_mcc
value: [0.9486833 0.84327404 0.89973541 0.83918129 0.94736842 0.94736842
0.94736842 0.89181287 1. 0.94736842]
mean value: 0.9212160587861828
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.92105263 0.94736842 0.91891892 0.97297297 0.97297297
0.97297297 0.94594595 1. 0.97297297]
mean value: 0.9598862019914651
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.91891892 0.94444444 0.91891892 0.97297297 0.97297297
0.97297297 0.94444444 1. 0.97297297]
mean value: 0.9592977592977593
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94444444 1. 0.94444444 0.94736842 0.94736842
0.94736842 0.94444444 1. 0.94736842]
mean value: 0.957280701754386
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.89473684 0.89473684 0.89473684 1. 1.
1. 0.94444444 1. 1. ]
mean value: 0.9628654970760234
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.92105263 0.94736842 0.91959064 0.97368421 0.97368421
0.97368421 0.94590643 1. 0.97368421]
mean value: 0.960233918128655
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.85 0.89473684 0.85 0.94736842 0.94736842
0.94736842 0.89473684 1. 0.94736842]
mean value: 0.9228947368421052
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03995848 0.03733993 0.03714538 0.03793836 0.08239031 0.03703666
0.03627944 0.07113647 0.07073593 0.04976606]
mean value: 0.049972701072692874
key: score_time
value: [0.01242304 0.01249933 0.01246953 0.01262975 0.01240516 0.01247954
0.01251364 0.02486706 0.02162528 0.01249456]
mean value: 0.0146406888961792
key: test_mcc
value: [0.89473684 0.53300179 0.59222009 0.56725146 0.89736456 0.68035483
0.94736842 0.89679028 0.84834956 0.62170355]
mean value: 0.7479141390576655
key: train_mcc
value: [0.95248307 0.95222816 0.95822045 0.94051126 0.940526 0.9285613
0.9285613 0.9523742 0.94643395 0.94656062]
mean value: 0.9446460333429583
key: test_accuracy
value: [0.94736842 0.76315789 0.78947368 0.78378378 0.94594595 0.83783784
0.97297297 0.94594595 0.91891892 0.81081081]
mean value: 0.8716216216216216
key: train_accuracy
value: [0.9761194 0.9761194 0.97910448 0.9702381 0.9702381 0.96428571
0.96428571 0.97619048 0.97321429 0.97321429]
mean value: 0.9723009950248757
key: test_fscore
value: [0.94736842 0.7804878 0.76470588 0.78947368 0.94736842 0.84210526
0.97297297 0.94117647 0.90909091 0.8 ]
mean value: 0.8694749829356792
key: train_fscore
value: [0.97546012 0.97575758 0.97885196 0.9695122 0.96969697 0.96385542
0.96385542 0.97590361 0.97280967 0.97264438]
mean value: 0.9718347329426844
key: test_precision
value: [0.94736842 0.72727273 0.86666667 0.78947368 0.9 0.8
0.94736842 1. 1. 0.82352941]
mean value: 0.8801679332019889
key: train_precision
value: [0.98757764 0.97575758 0.97590361 0.97546012 0.97560976 0.96385542
0.96385542 0.97590361 0.97575758 0.98159509]
mean value: 0.9751275834377349
key: test_recall
value: [0.94736842 0.84210526 0.68421053 0.78947368 1. 0.88888889
1. 0.88888889 0.83333333 0.77777778]
mean value: 0.8652046783625731
key: train_recall
value: [0.96363636 0.97575758 0.98181818 0.96363636 0.96385542 0.96385542
0.96385542 0.97590361 0.96987952 0.96385542]
mean value: 0.9686053304125594
key: test_roc_auc
value: [0.94736842 0.76315789 0.78947368 0.78362573 0.94736842 0.83918129
0.97368421 0.94444444 0.91666667 0.80994152]
mean value: 0.8714912280701754
key: train_roc_auc
value: [0.97593583 0.97611408 0.97914439 0.97012228 0.970163 0.96428065
0.96428065 0.9761871 0.97317505 0.97310418]
mean value: 0.9722507216218284
key: test_jcc
value: [0.9 0.64 0.61904762 0.65217391 0.9 0.72727273
0.94736842 0.88888889 0.83333333 0.66666667]
mean value: 0.7774751569305345
key: train_jcc
value: [0.95209581 0.95266272 0.95857988 0.9408284 0.94117647 0.93023256
0.93023256 0.95294118 0.94705882 0.94674556]
mean value: 0.9452553963297876
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02484918 0.00989151 0.01043844 0.01072907 0.01075554 0.01006603
0.00961328 0.0097363 0.00997925 0.00970602]
mean value: 0.011576461791992187
key: score_time
value: [0.01031947 0.00934625 0.00986648 0.00878239 0.00917149 0.00902915
0.00898862 0.00896931 0.00876212 0.0088017 ]
mean value: 0.009203696250915527
key: test_mcc
value: [0.78947368 0.31622777 0.61017022 0.63129316 0.67849265 0.63129316
0.73020842 0.73821295 0.62170355 0.78362573]
mean value: 0.6530701274630709
key: train_mcc
value: [0.67195163 0.71367434 0.64874079 0.72056751 0.63176039 0.66756867
0.75591389 0.64914987 0.66690353 0.71425535]
mean value: 0.6840485961335292
key: test_accuracy
value: [0.89473684 0.65789474 0.78947368 0.81081081 0.83783784 0.81081081
0.86486486 0.86486486 0.81081081 0.89189189]
mean value: 0.8233997155049787
key: train_accuracy
value: [0.8358209 0.85671642 0.8238806 0.86011905 0.81547619 0.83333333
0.87797619 0.82440476 0.83333333 0.85714286]
mean value: 0.8418203624733476
key: test_fscore
value: [0.89473684 0.64864865 0.75 0.8 0.82352941 0.82051282
0.85714286 0.84848485 0.8 0.88888889]
mean value: 0.8131944317548032
key: train_fscore
value: [0.82972136 0.85185185 0.81504702 0.85448916 0.80745342 0.82608696
0.87613293 0.81846154 0.82822086 0.85454545]
mean value: 0.8362010555198316
key: test_precision
value: [0.89473684 0.66666667 0.92307692 0.875 0.875 0.76190476
0.88235294 0.93333333 0.82352941 0.88888889]
mean value: 0.8524489768917013
key: train_precision
value: [0.84810127 0.86792453 0.84415584 0.87341772 0.83333333 0.8525641
0.87878788 0.83647799 0.84375 0.8597561 ]
mean value: 0.8538268759467177
key: test_recall
value: [0.89473684 0.63157895 0.63157895 0.73684211 0.77777778 0.88888889
0.83333333 0.77777778 0.77777778 0.88888889]
mean value: 0.7839181286549708
key: train_recall
value: [0.81212121 0.83636364 0.78787879 0.83636364 0.78313253 0.80120482
0.87349398 0.80120482 0.81325301 0.84939759]
mean value: 0.8194414019715224
key: test_roc_auc
value: [0.89473684 0.65789474 0.78947368 0.8128655 0.83625731 0.8128655
0.86403509 0.8625731 0.80994152 0.89181287]
mean value: 0.8232456140350878
key: train_roc_auc
value: [0.83547237 0.85641711 0.82335116 0.85970229 0.81509568 0.83295535
0.87792346 0.82413182 0.83309709 0.85705174]
mean value: 0.8415198065929164
key: test_jcc
value: [0.80952381 0.48 0.6 0.66666667 0.7 0.69565217
0.75 0.73684211 0.66666667 0.8 ]
mean value: 0.6905351422033345
key: train_jcc
value: [0.70899471 0.74193548 0.68783069 0.74594595 0.67708333 0.7037037
0.77956989 0.69270833 0.70680628 0.74603175]
mean value: 0.7190610118240058
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01237154 0.01653957 0.0177362 0.02279115 0.01850939 0.01929617
0.01568437 0.01943159 0.01927876 0.01795244]
mean value: 0.017959117889404297
key: score_time
value: [0.00886655 0.01132083 0.01133919 0.01192641 0.01196933 0.01196837
0.01189017 0.01204634 0.01206517 0.01196551]
mean value: 0.01153578758239746
key: test_mcc
value: [0.89473684 0.63960215 0.4330127 0.74044197 0.78362573 0.63129316
0.7888597 0.83918129 0.84959079 0.84834956]
mean value: 0.7448693884912015
key: train_mcc
value: [0.88773584 0.90033348 0.49718308 0.92909689 0.85527622 0.91673163
0.86308142 0.91673163 0.85029687 0.87836587]
mean value: 0.8494832930737864
key: test_accuracy
value: [0.94736842 0.81578947 0.65789474 0.86486486 0.89189189 0.81081081
0.89189189 0.91891892 0.91891892 0.91891892]
mean value: 0.8637268847795164
key: train_accuracy
value: [0.94328358 0.94925373 0.69552239 0.96428571 0.92261905 0.95833333
0.93154762 0.95833333 0.92261905 0.9375 ]
mean value: 0.9183297796730633
key: test_fscore
value: [0.94736842 0.8 0.74509804 0.85714286 0.88888889 0.82051282
0.89473684 0.91891892 0.92307692 0.90909091]
mean value: 0.8704834620004899
key: train_fscore
value: [0.94080997 0.94670846 0.76388889 0.96296296 0.91503268 0.95808383
0.9305136 0.95808383 0.92571429 0.93375394]
mean value: 0.9235552453156383
key: test_precision
value: [0.94736842 0.875 0.59375 0.9375 0.88888889 0.76190476
0.85 0.89473684 0.85714286 1. ]
mean value: 0.8606291771094402
key: train_precision
value: [0.96794872 0.98051948 0.61797753 0.98113208 1. 0.95238095
0.93333333 0.95238095 0.88043478 0.98013245]
mean value: 0.9246240273064844
key: test_recall
value: [0.94736842 0.73684211 1. 0.78947368 0.88888889 0.88888889
0.94444444 0.94444444 1. 0.83333333]
mean value: 0.8973684210526316
key: train_recall
value: [0.91515152 0.91515152 1. 0.94545455 0.84337349 0.96385542
0.92771084 0.96385542 0.97590361 0.89156627]
mean value: 0.9342022635998539
key: test_roc_auc
value: [0.94736842 0.81578947 0.65789474 0.86695906 0.89181287 0.8128655
0.89327485 0.91959064 0.92105263 0.91666667]
mean value: 0.8643274853801169
key: train_roc_auc
value: [0.94286988 0.94875223 0.7 0.96395534 0.92168675 0.9583983
0.93150248 0.9583983 0.92324592 0.9369596 ]
mean value: 0.9185768799939413
key: test_jcc
value: [0.9 0.66666667 0.59375 0.75 0.8 0.69565217
0.80952381 0.85 0.85714286 0.83333333]
mean value: 0.7756068840579711
key: train_jcc
value: [0.88823529 0.89880952 0.61797753 0.92857143 0.84337349 0.91954023
0.8700565 0.91954023 0.86170213 0.87573964]
mean value: 0.8623545998139636
MCC on Blind test: 0.83
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01700234 0.01797867 0.01679993 0.01786995 0.0184319 0.01814771
0.01903105 0.01888227 0.0194633 0.01629138]
mean value: 0.017989850044250487
key: score_time
value: [0.01207399 0.01205277 0.012187 0.01223421 0.01236439 0.01218557
0.01222467 0.01221824 0.01213694 0.01256537]
mean value: 0.012224316596984863
key: test_mcc
value: [0.78947368 0.16439899 0.73786479 0.7163504 0.84959079 0.68035483
0.83871328 0.67849265 0.94721815 0.40611643]
mean value: 0.6808573989700671
key: train_mcc
value: [0.89255789 0.36277429 0.87738561 0.86960067 0.89521641 0.92337258
0.87750371 0.90781863 0.79887733 0.48613777]
mean value: 0.7891244887522015
key: test_accuracy
value: [0.89473684 0.52631579 0.86842105 0.83783784 0.91891892 0.83783784
0.91891892 0.83783784 0.97297297 0.64864865]
mean value: 0.82624466571835
key: train_accuracy
value: [0.94626866 0.6119403 0.93731343 0.93154762 0.94642857 0.96130952
0.9375 0.95238095 0.88988095 0.69345238]
mean value: 0.8808022388059702
key: test_fscore
value: [0.89473684 0.67857143 0.86486486 0.8125 0.92307692 0.84210526
0.91428571 0.82352941 0.97142857 0.43478261]
mean value: 0.8159881627951018
key: train_fscore
value: [0.94512195 0.7173913 0.93877551 0.92556634 0.94767442 0.96
0.93416928 0.94968553 0.87457627 0.55021834]
mean value: 0.8743178952803997
key: test_precision
value: [0.89473684 0.51351351 0.88888889 1. 0.85714286 0.8
0.94117647 0.875 1. 1. ]
mean value: 0.8770458572238757
key: train_precision
value: [0.95092025 0.55932203 0.90449438 0.99305556 0.91573034 0.98113208
0.97385621 0.99342105 1. 1. ]
mean value: 0.9271931891207361
key: test_recall
value: [0.89473684 1. 0.84210526 0.68421053 1. 0.88888889
0.88888889 0.77777778 0.94444444 0.27777778]
mean value: 0.8198830409356725
key: train_recall
value: [0.93939394 1. 0.97575758 0.86666667 0.98192771 0.93975904
0.89759036 0.90963855 0.77710843 0.37951807]
mean value: 0.8667360350492881
key: test_roc_auc
value: [0.89473684 0.52631579 0.86842105 0.84210526 0.92105263 0.83918129
0.91812865 0.83625731 0.97222222 0.63888889]
mean value: 0.8257309941520468
key: train_roc_auc
value: [0.94616756 0.61764706 0.93787879 0.93040936 0.94684621 0.96105599
0.93703047 0.9518781 0.88855422 0.68975904]
mean value: 0.8807226786873548
key: test_jcc
value: [0.80952381 0.51351351 0.76190476 0.68421053 0.85714286 0.72727273
0.84210526 0.7 0.94444444 0.27777778]
mean value: 0.7117895681053575
key: train_jcc
value: [0.89595376 0.55932203 0.88461538 0.86144578 0.90055249 0.92307692
0.87647059 0.90419162 0.77710843 0.37951807]
mean value: 0.7962255079162279
MCC on Blind test: 0.7
Accuracy on Blind test: 0.84
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.16781831 0.1466465 0.15688443 0.15079045 0.14964795 0.15363979
0.15089941 0.15201402 0.14735866 0.15484333]
mean value: 0.15305428504943847
key: score_time
value: [0.01549721 0.01608682 0.01628304 0.01581788 0.01523232 0.01664662
0.01632547 0.01588321 0.01541471 0.01603723]
mean value: 0.01592245101928711
key: test_mcc
value: [0.9486833 0.89473684 0.85280287 0.83918129 0.89736456 0.89181287
0.94736842 0.94721815 1. 0.94736842]
mean value: 0.9166536711379966
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.92105263 0.91891892 0.94594595 0.94594595
0.97297297 0.97297297 1. 0.97297297]
mean value: 0.9571834992887625
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.94736842 0.91428571 0.91891892 0.94736842 0.94444444
0.97297297 0.97142857 1. 0.97297297]
mean value: 0.9564119411487833
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94736842 1. 0.94444444 0.9 0.94444444
0.94736842 1. 1. 0.94736842]
mean value: 0.9580994152046783
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94736842 0.84210526 0.89473684 1. 0.94444444
1. 0.94444444 1. 1. ]
mean value: 0.9573099415204678
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.92105263 0.91959064 0.94736842 0.94590643
0.97368421 0.97222222 1. 0.97368421]
mean value: 0.9574561403508772
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.9 0.84210526 0.85 0.9 0.89473684
0.94736842 0.94444444 1. 0.94736842]
mean value: 0.9176023391812865
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.96
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05195355 0.04780698 0.07163262 0.05937195 0.06207514 0.06070542
0.06475472 0.06434655 0.0617311 0.04681015]
mean value: 0.05911881923675537
key: score_time
value: [0.02426767 0.02480149 0.030478 0.02772665 0.03134751 0.02338552
0.03266764 0.02413774 0.01961756 0.02972317]
mean value: 0.026815295219421387
key: test_mcc
value: [0.9486833 0.89973541 0.76376262 0.94736842 0.89736456 0.89181287
0.94736842 0.89181287 1. 0.94736842]
mean value: 0.9135276881295149
key: train_mcc
value: [0.99404571 1. 0.99404571 1. 1. 0.99406397
0.99406397 0.98816193 0.98229327 0.98229327]
mean value: 0.9928967834016768
key: test_accuracy
value: [0.97368421 0.94736842 0.86842105 0.97297297 0.94594595 0.94594595
0.97297297 0.94594595 1. 0.97297297]
mean value: 0.9546230440967283
key: train_accuracy
value: [0.99701493 1. 0.99701493 1. 1. 0.99702381
0.99702381 0.99404762 0.99107143 0.99107143]
mean value: 0.9964267945984364
key: test_fscore
value: [0.97435897 0.94444444 0.84848485 0.97297297 0.94736842 0.94444444
0.97297297 0.94444444 1. 0.97297297]
mean value: 0.9522464496148707
key: train_fscore
value: [0.99696049 1. 0.99696049 1. 1. 0.99697885
0.99697885 0.99393939 0.99088146 0.99088146]
mean value: 0.9963580988444394
key: test_precision
value: [0.95 1. 1. 1. 0.9 0.94444444
0.94736842 0.94444444 1. 0.94736842]
mean value: 0.9633625730994152
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.89473684 0.73684211 0.94736842 1. 0.94444444
1. 0.94444444 1. 1. ]
mean value: 0.9467836257309942
key: train_recall
value: [0.99393939 1. 0.99393939 1. 1. 0.9939759
0.9939759 0.98795181 0.98192771 0.98192771]
mean value: 0.9927637824023366
key: test_roc_auc
value: [0.97368421 0.94736842 0.86842105 0.97368421 0.94736842 0.94590643
0.97368421 0.94590643 1. 0.97368421]
mean value: 0.9549707602339181
key: train_roc_auc
value: [0.9969697 1. 0.9969697 1. 1. 0.99698795
0.99698795 0.9939759 0.99096386 0.99096386]
mean value: 0.9963818912011683
key: test_jcc
value: [0.95 0.89473684 0.73684211 0.94736842 0.9 0.89473684
0.94736842 0.89473684 1. 0.94736842]
mean value: 0.9113157894736842
key: train_jcc
value: [0.99393939 1. 0.99393939 1. 1. 0.9939759
0.9939759 0.98795181 0.98192771 0.98192771]
mean value: 0.9927637824023366
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.07039738 0.0819912 0.04745841 0.07536864 0.0871973 0.08062267
0.09268117 0.08872008 0.07291102 0.04991651]
mean value: 0.07472643852233887
key: score_time
value: [0.02195954 0.01393104 0.01386929 0.02197075 0.02660823 0.0219326
0.02215648 0.02231789 0.01488566 0.02564073]
mean value: 0.020527219772338866
key: test_mcc
value: [0.79388419 0.31622777 0.69989647 0.42489158 0.7888597 0.29766651
0.56934383 0.56725146 0.62170355 0.4670794 ]
mean value: 0.554680445041403
key: train_mcc
value: [0.99404571 0.99404571 0.99404571 0.99406271 1. 1.
0.99406397 0.99406397 0.99406397 0.99406397]
mean value: 0.9952455741790717
key: test_accuracy
value: [0.89473684 0.65789474 0.84210526 0.7027027 0.89189189 0.64864865
0.78378378 0.78378378 0.81081081 0.72972973]
mean value: 0.7746088193456615
key: train_accuracy
value: [0.99701493 0.99701493 0.99701493 0.99702381 1. 1.
0.99702381 0.99702381 0.99702381 0.99702381]
mean value: 0.9976163823738451
key: test_fscore
value: [0.88888889 0.66666667 0.82352941 0.66666667 0.89473684 0.60606061
0.76470588 0.77777778 0.8 0.6875 ]
mean value: 0.7576532742283516
key: train_fscore
value: [0.99696049 0.99696049 0.99696049 0.99696049 1. 1.
0.99697885 0.99697885 0.99697885 0.99697885]
mean value: 0.9975757353143739
key: test_precision
value: [0.94117647 0.65 0.93333333 0.78571429 0.85 0.66666667
0.8125 0.77777778 0.82352941 0.78571429]
mean value: 0.802641223155929
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84210526 0.68421053 0.73684211 0.57894737 0.94444444 0.55555556
0.72222222 0.77777778 0.77777778 0.61111111]
mean value: 0.7230994152046784
key: train_recall
value: [0.99393939 0.99393939 0.99393939 0.99393939 1. 1.
0.9939759 0.9939759 0.9939759 0.9939759 ]
mean value: 0.9951661190215407
key: test_roc_auc
value: [0.89473684 0.65789474 0.84210526 0.70614035 0.89327485 0.64619883
0.78216374 0.78362573 0.80994152 0.72660819]
mean value: 0.7742690058479532
key: train_roc_auc
value: [0.9969697 0.9969697 0.9969697 0.9969697 1. 1.
0.99698795 0.99698795 0.99698795 0.99698795]
mean value: 0.9975830595107703
key: test_jcc
value: [0.8 0.5 0.7 0.5 0.80952381 0.43478261
0.61904762 0.63636364 0.66666667 0.52380952]
mean value: 0.6190193864106908
key: train_jcc
value: [0.99393939 0.99393939 0.99393939 0.99393939 1. 1.
0.9939759 0.9939759 0.9939759 0.9939759 ]
mean value: 0.9951661190215407
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.56135869 0.54637766 0.55692697 0.5550313 0.54440451 0.53952956
0.54246831 0.550493 0.56960869 0.54170847]
mean value: 0.5507907152175904
key: score_time
value: [0.00987864 0.00982213 0.00943971 0.00938654 0.00945401 0.00924611
0.0097692 0.01039886 0.00971866 0.00918841]
mean value: 0.009630227088928222
key: test_mcc
value: [0.9486833 0.84327404 0.89973541 0.89181287 0.89736456 0.89181287
0.94736842 0.89181287 1. 0.89736456]
mean value: 0.9109228893996735
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.92105263 0.94736842 0.94594595 0.94594595 0.94594595
0.97297297 0.94594595 1. 0.94594595]
mean value: 0.9544807965860598
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.91891892 0.94444444 0.94736842 0.94736842 0.94444444
0.97297297 0.94444444 1. 0.94736842]
mean value: 0.9541689462742095
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94444444 1. 0.94736842 0.9 0.94444444
0.94736842 0.94444444 1. 0.9 ]
mean value: 0.9478070175438597
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.89473684 0.89473684 0.94736842 1. 0.94444444
1. 0.94444444 1. 1. ]
mean value: 0.9625730994152046
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.92105263 0.94736842 0.94590643 0.94736842 0.94590643
0.97368421 0.94590643 1. 0.94736842]
mean value: 0.9548245614035088
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.85 0.89473684 0.9 0.9 0.89473684
0.94736842 0.89473684 1. 0.9 ]
mean value: 0.9131578947368421
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02600288 0.02697515 0.02713609 0.02771711 0.0268352 0.02741385
0.02773118 0.04674792 0.04764318 0.03253341]
mean value: 0.03167359828948975
key: score_time
value: [0.01562953 0.01249695 0.01259661 0.01253057 0.01317787 0.04645109
0.0191071 0.01679397 0.01876235 0.01601005]
mean value: 0.018355607986450195
key: test_mcc
value: [0.48454371 0.31622777 0.45291081 0.26327408 0.51319869 0.24408665
0.13259028 0.35104619 0.52960948 0.18768409]
mean value: 0.3475171758801787
key: train_mcc
value: [0.88134724 0.68582485 0.76325259 0.70543403 0.91426696 0.9285613
0.70124655 0.74453167 0.9253171 0.7690121 ]
mean value: 0.8018794389354479
key: test_accuracy
value: [0.73684211 0.65789474 0.71052632 0.62162162 0.75675676 0.62162162
0.56756757 0.67567568 0.75675676 0.59459459]
mean value: 0.6699857752489331
key: train_accuracy
value: [0.93731343 0.82089552 0.86865672 0.83333333 0.95535714 0.96428571
0.83035714 0.85714286 0.96130952 0.87202381]
mean value: 0.8900675195451315
key: test_fscore
value: [0.70588235 0.64864865 0.64516129 0.5625 0.74285714 0.5625
0.5 0.64705882 0.70967742 0.57142857]
mean value: 0.629571424908237
key: train_fscore
value: [0.93203883 0.77777778 0.84615385 0.79562044 0.95268139 0.96385542
0.79272727 0.83098592 0.95924765 0.85121107]
mean value: 0.8702299616326061
key: test_precision
value: [0.8 0.66666667 0.83333333 0.69230769 0.76470588 0.64285714
0.57142857 0.6875 0.84615385 0.58823529]
mean value: 0.709318842921784
key: train_precision
value: [1. 1. 1. 1. 1. 0.96385542
1. 1. 1. 1. ]
mean value: 0.9963855421686747
key: test_recall
value: [0.63157895 0.63157895 0.52631579 0.47368421 0.72222222 0.5
0.44444444 0.61111111 0.61111111 0.55555556]
mean value: 0.5707602339181287
key: train_recall
value: [0.87272727 0.63636364 0.73333333 0.66060606 0.90963855 0.96385542
0.65662651 0.71084337 0.92168675 0.74096386]
mean value: 0.7806644760861629
key: test_roc_auc
value: [0.73684211 0.65789474 0.71052632 0.62573099 0.75584795 0.61842105
0.56432749 0.67397661 0.75292398 0.59356725]
mean value: 0.6690058479532164
key: train_roc_auc
value: [0.93636364 0.81818182 0.86666667 0.83030303 0.95481928 0.96428065
0.82831325 0.85542169 0.96084337 0.87048193]
mean value: 0.8885675321607285
key: test_jcc
value: [0.54545455 0.48 0.47619048 0.39130435 0.59090909 0.39130435
0.33333333 0.47826087 0.55 0.4 ]
mean value: 0.46367570111048373
key: train_jcc
value: [0.87272727 0.63636364 0.73333333 0.66060606 0.90963855 0.93023256
0.65662651 0.71084337 0.92168675 0.74096386]
mean value: 0.7773021897314416
MCC on Blind test: 0.45
Accuracy on Blind test: 0.72
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02517295 0.03688502 0.04392886 0.03298044 0.03695798 0.03685045
0.03669524 0.03654885 0.03679013 0.0368619 ]
mean value: 0.03596718311309814
key: score_time
value: [0.02065396 0.02096987 0.02294397 0.02419424 0.02160645 0.01831365
0.0233593 0.02103519 0.02230525 0.02347684]
mean value: 0.02188587188720703
key: test_mcc
value: [0.89473684 0.47633051 0.68803296 0.68035483 0.89736456 0.62280702
0.89736456 0.89181287 1. 0.89679028]
mean value: 0.7945594433337401
key: train_mcc
value: [0.89251337 0.89259616 0.88671444 0.89290904 0.88691246 0.88700621
0.88101481 0.8870542 0.88691246 0.88691246]
mean value: 0.8880545619279835
key: test_accuracy
value: [0.94736842 0.73684211 0.84210526 0.83783784 0.94594595 0.81081081
0.94594595 0.94594595 1. 0.94594595]
mean value: 0.8958748221906117
key: train_accuracy
value: [0.94626866 0.94626866 0.94328358 0.94642857 0.94345238 0.94345238
0.94047619 0.94345238 0.94345238 0.94345238]
mean value: 0.9439987562189055
key: test_fscore
value: [0.94736842 0.75 0.83333333 0.83333333 0.94736842 0.81081081
0.94736842 0.94444444 1. 0.94117647]
mean value: 0.8955203655668051
key: train_fscore
value: [0.94545455 0.94578313 0.94294294 0.94578313 0.94294294 0.94224924
0.94011976 0.94328358 0.94294294 0.94294294]
mean value: 0.9434445164976734
key: test_precision
value: [0.94736842 0.71428571 0.88235294 0.88235294 0.9 0.78947368
0.9 0.94444444 1. 1. ]
mean value: 0.8960278146346258
key: train_precision
value: [0.94545455 0.94011976 0.93452381 0.94011976 0.94011976 0.95092025
0.93452381 0.93491124 0.94011976 0.94011976]
mean value: 0.9400932454899698
key: test_recall
value: [0.94736842 0.78947368 0.78947368 0.78947368 1. 0.83333333
1. 0.94444444 1. 0.88888889]
mean value: 0.8982456140350877
key: train_recall
value: [0.94545455 0.95151515 0.95151515 0.95151515 0.94578313 0.93373494
0.94578313 0.95180723 0.94578313 0.94578313]
mean value: 0.9468674698795181
key: test_roc_auc
value: [0.94736842 0.73684211 0.84210526 0.83918129 0.94736842 0.81140351
0.94736842 0.94590643 1. 0.94444444]
mean value: 0.8961988304093568
key: train_roc_auc
value: [0.94625668 0.94634581 0.94340463 0.94651781 0.9434798 0.94333806
0.94053863 0.94355067 0.9434798 0.9434798 ]
mean value: 0.9440391700962778
key: test_jcc
value: [0.9 0.6 0.71428571 0.71428571 0.9 0.68181818
0.9 0.89473684 1. 0.88888889]
mean value: 0.8194015341383762
key: train_jcc
value: [0.89655172 0.89714286 0.89204545 0.89714286 0.89204545 0.8908046
0.88700565 0.89265537 0.89204545 0.89204545]
mean value: 0.8929484871255766
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25039506 0.26338816 0.26077509 0.28984904 0.34190536 0.20279026
0.25511289 0.26039338 0.27494764 0.29335022]
mean value: 0.26929070949554446
key: score_time
value: [0.02418995 0.02259231 0.02381563 0.0246079 0.02679062 0.02062678
0.02431607 0.02038789 0.02436209 0.03717709]
mean value: 0.02488663196563721
key: test_mcc
value: [0.89473684 0.47633051 0.65465367 0.68035483 0.89736456 0.62280702
0.78362573 0.89181287 1. 0.89679028]
mean value: 0.7798476311382899
key: train_mcc
value: [0.89251337 0.89259616 0.95822045 0.89290904 0.88691246 0.88700621
0.80949681 0.8870542 0.88691246 0.88691246]
mean value: 0.8880533628410842
key: test_accuracy
value: [0.94736842 0.73684211 0.81578947 0.83783784 0.94594595 0.81081081
0.89189189 0.94594595 1. 0.94594595]
mean value: 0.8878378378378379
key: train_accuracy
value: [0.94626866 0.94626866 0.97910448 0.94642857 0.94345238 0.94345238
0.9047619 0.94345238 0.94345238 0.94345238]
mean value: 0.9440094171997157
key: test_fscore
value: [0.94736842 0.75 0.78787879 0.83333333 0.94736842 0.81081081
0.88888889 0.94444444 1. 0.94117647]
mean value: 0.8851269578049764
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.94545455 0.94578313 0.97885196 0.94578313 0.94294294 0.94224924
0.90361446 0.94328358 0.94294294 0.94294294]
mean value: 0.9433848883132298
key: test_precision
value: [0.94736842 0.71428571 0.92857143 0.88235294 0.9 0.78947368
0.88888889 0.94444444 1. 1. ]
mean value: 0.8995385522630105
key: train_precision
value: [0.94545455 0.94011976 0.97590361 0.94011976 0.94011976 0.95092025
0.90361446 0.93491124 0.94011976 0.94011976]
mean value: 0.9411402908141235
key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 1. 0.83333333
0.88888889 0.94444444 1. 0.88888889]
mean value: 0.876608187134503
key: train_recall
value: [0.94545455 0.95151515 0.98181818 0.95151515 0.94578313 0.93373494
0.90361446 0.95180723 0.94578313 0.94578313]
mean value: 0.9456809054399415
key: test_roc_auc
value: [0.94736842 0.73684211 0.81578947 0.83918129 0.94736842 0.81140351
0.89181287 0.94590643 1. 0.94444444]
mean value: 0.8880116959064328
key: train_roc_auc
value: [0.94625668 0.94634581 0.97914439 0.94651781 0.9434798 0.94333806
0.90474841 0.94355067 0.9434798 0.9434798 ]
mean value: 0.9440341231706072
key: test_jcc
value: [0.9 0.6 0.65 0.71428571 0.9 0.68181818
0.8 0.89473684 1. 0.88888889]
mean value: 0.8029729627098048
key: train_jcc
value: [0.89655172 0.89714286 0.95857988 0.89714286 0.89204545 0.8908046
0.82417582 0.89265537 0.89204545 0.89204545]
mean value: 0.8933189472825426
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03251314 0.03675842 0.04648995 0.03551435 0.03649688 0.03578806
0.03548098 0.0361886 0.02459049 0.03698349]
mean value: 0.0356804370880127
key: score_time
value: [0.01281786 0.0131588 0.01396585 0.01327658 0.01287436 0.01276183
0.0128417 0.01410079 0.01277828 0.01403594]
mean value: 0.013261198997497559
key: test_mcc
value: [0.9486833 0.9486833 0.73786479 0.84327404 0.73786479 0.63245553
0.89473684 0.79388419 0.62807634 0.78764146]
mean value: 0.7953164573929877
key: train_mcc
value: [0.85307402 0.87648575 0.87660709 0.85906136 0.86472084 0.87660709
0.88241401 0.87648575 0.88269694 0.88275364]
mean value: 0.8730906512039277
key: test_accuracy
value: [0.97368421 0.97368421 0.86842105 0.92105263 0.86842105 0.81578947
0.94736842 0.89473684 0.81081081 0.89189189]
mean value: 0.8965860597439544
key: train_accuracy
value: [0.92647059 0.93823529 0.93823529 0.92941176 0.93235294 0.93823529
0.94117647 0.93823529 0.94134897 0.94134897]
mean value: 0.9365050888390547
key: test_fscore
value: [0.97435897 0.97435897 0.86486486 0.92307692 0.86486486 0.81081081
0.94736842 0.9 0.82926829 0.88235294]
mean value: 0.8971325067247442
key: train_fscore
value: [0.9271137 0.93841642 0.93877551 0.93023256 0.93255132 0.93877551
0.94152047 0.9380531 0.94117647 0.94186047]
mean value: 0.9368475523992993
key: test_precision
value: [0.95 0.95 0.88888889 0.9 0.88888889 0.83333333
0.94736842 0.85714286 0.77272727 0.9375 ]
mean value: 0.8925849662033872
key: train_precision
value: [0.91907514 0.93567251 0.93063584 0.91954023 0.92982456 0.93063584
0.93604651 0.9408284 0.94117647 0.93641618]
mean value: 0.9319851696271803
key: test_recall
value: [1. 1. 0.84210526 0.94736842 0.84210526 0.78947368
0.94736842 0.94736842 0.89473684 0.83333333]
mean value: 0.9043859649122807
key: train_recall
value: [0.93529412 0.94117647 0.94705882 0.94117647 0.93529412 0.94705882
0.94705882 0.93529412 0.94117647 0.94736842]
mean value: 0.9417956656346749
key: test_roc_auc
value: [0.97368421 0.97368421 0.86842105 0.92105263 0.86842105 0.81578947
0.94736842 0.89473684 0.80847953 0.89035088]
mean value: 0.8961988304093567
key: train_roc_auc
value: [0.92647059 0.93823529 0.93823529 0.92941176 0.93235294 0.93823529
0.94117647 0.93823529 0.94134847 0.94133127]
mean value: 0.9365032679738562
key: test_jcc
value: [0.95 0.95 0.76190476 0.85714286 0.76190476 0.68181818
0.9 0.81818182 0.70833333 0.78947368]
mean value: 0.817875939849624
key: train_jcc
value: [0.86413043 0.8839779 0.88461538 0.86956522 0.87362637 0.88461538
0.88950276 0.88333333 0.88888889 0.89010989]
mean value: 0.8812365570346593
MCC on Blind test: 0.84
Accuracy on Blind test: 0.92
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.75315642 1.01591659 0.78028083 0.772331 0.9129324 0.76234174
0.9127357 0.87172246 0.7669878 0.96696448]
mean value: 0.8515369415283203
key: score_time
value: [0.01336694 0.01260972 0.01373863 0.01248026 0.01332521 0.01331091
0.01295376 0.01237464 0.01341295 0.01255274]
mean value: 0.013012576103210449
key: test_mcc
value: [0.9486833 0.89973541 0.73786479 0.79388419 0.78947368 0.63245553
0.84327404 0.79388419 0.68035483 0.78764146]
mean value: 0.790725142095092
key: train_mcc
value: [0.88825066 0.89417953 1. 0.8177744 0.90014017 0.91178048
0.97647059 0.8058963 1. 0.90030617]
mean value: 0.9094798300569955
key: test_accuracy
value: [0.97368421 0.94736842 0.86842105 0.89473684 0.89473684 0.81578947
0.92105263 0.89473684 0.83783784 0.89189189]
mean value: 0.8940256045519204
key: train_accuracy
value: [0.94411765 0.94705882 1. 0.90882353 0.95 0.95588235
0.98823529 0.90294118 1. 0.95014663]
mean value: 0.9547205451095394
key: test_fscore
value: [0.97435897 0.95 0.87179487 0.9 0.89473684 0.81081081
0.92307692 0.9 0.83333333 0.88235294]
mean value: 0.8940464696656647
key: train_fscore
value: [0.94428152 0.94736842 1. 0.90962099 0.95043732 0.95601173
0.98823529 0.90265487 1. 0.95043732]
mean value: 0.9549047464381037
key: test_precision
value: [0.95 0.9047619 0.85 0.85714286 0.89473684 0.83333333
0.9 0.85714286 0.88235294 0.9375 ]
mean value: 0.8866970735662686
key: train_precision
value: [0.94152047 0.94186047 1. 0.9017341 0.94219653 0.95321637
0.98823529 0.90532544 1. 0.94767442]
mean value: 0.9521763099568973
key: test_recall
value: [1. 1. 0.89473684 0.94736842 0.89473684 0.78947368
0.94736842 0.94736842 0.78947368 0.83333333]
mean value: 0.9043859649122807
key: train_recall
value: [0.94705882 0.95294118 1. 0.91764706 0.95882353 0.95882353
0.98823529 0.9 1. 0.95321637]
mean value: 0.9576745786033711
key: test_roc_auc
value: [0.97368421 0.94736842 0.86842105 0.89473684 0.89473684 0.81578947
0.92105263 0.89473684 0.83918129 0.89035088]
mean value: 0.8940058479532164
key: train_roc_auc
value: [0.94411765 0.94705882 1. 0.90882353 0.95 0.95588235
0.98823529 0.90294118 1. 0.9501376 ]
mean value: 0.9547196422428621
key: test_jcc
value: [0.95 0.9047619 0.77272727 0.81818182 0.80952381 0.68181818
0.85714286 0.81818182 0.71428571 0.78947368]
mean value: 0.8116097060833903
key: train_jcc
value: [0.89444444 0.9 1. 0.8342246 0.90555556 0.91573034
0.97674419 0.82258065 1. 0.90555556]
mean value: 0.9154835322772491
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01436734 0.01217151 0.01070666 0.00985885 0.00983238 0.01114702
0.01055717 0.01079178 0.0106504 0.01036644]
mean value: 0.011044955253601075
key: score_time
value: [0.01219034 0.00957274 0.00930095 0.00910378 0.00889158 0.00985646
0.00977993 0.00970864 0.00965691 0.00958467]
mean value: 0.009764599800109863
key: test_mcc
value: [0.68803296 0.69989647 0.69989647 0.79388419 0.57894737 0.63960215
0.47633051 0.73786479 0.57184997 0.69007214]
mean value: 0.6576377014238246
key: train_mcc
value: [0.66106903 0.63812671 0.63133581 0.66254793 0.63888551 0.6871247
0.63426969 0.68813955 0.67443892 0.63456594]
mean value: 0.6550503783024093
key: test_accuracy
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.81578947
0.73684211 0.86842105 0.78378378 0.83783784]
mean value: 0.8253200568990042
key: train_accuracy
value: [0.82941176 0.81764706 0.80588235 0.82941176 0.81764706 0.84117647
0.81470588 0.84117647 0.83577713 0.81524927]
mean value: 0.8248085216491289
key: test_fscore
value: [0.83333333 0.82352941 0.82352941 0.88888889 0.78947368 0.8
0.72222222 0.87179487 0.77777778 0.8125 ]
mean value: 0.8143049601757032
key: train_fscore
value: [0.82208589 0.80864198 0.77852349 0.81987578 0.80745342 0.83125
0.80250784 0.83018868 0.82716049 0.80495356]
mean value: 0.813264111779322
key: test_precision
value: [0.88235294 0.93333333 0.93333333 0.94117647 0.78947368 0.875
0.76470588 0.85 0.82352941 0.92857143]
mean value: 0.8721476485330975
key: train_precision
value: [0.85897436 0.85064935 0.90625 0.86842105 0.85526316 0.88666667
0.8590604 0.89189189 0.87012987 0.85526316]
mean value: 0.8702569909417754
key: test_recall
value: [0.78947368 0.73684211 0.73684211 0.84210526 0.78947368 0.73684211
0.68421053 0.89473684 0.73684211 0.72222222]
mean value: 0.7669590643274854
key: train_recall
value: [0.78823529 0.77058824 0.68235294 0.77647059 0.76470588 0.78235294
0.75294118 0.77647059 0.78823529 0.76023392]
mean value: 0.7642586859305125
key: test_roc_auc
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.81578947
0.73684211 0.86842105 0.78508772 0.83479532]
mean value: 0.8251461988304094
key: train_roc_auc
value: [0.82941176 0.81764706 0.80588235 0.82941176 0.81764706 0.84117647
0.81470588 0.84117647 0.83563811 0.81541108]
mean value: 0.8248108015135879
key: test_jcc
value: [0.71428571 0.7 0.7 0.8 0.65217391 0.66666667
0.56521739 0.77272727 0.63636364 0.68421053]
mean value: 0.6891645120706905
key: train_jcc
value: [0.69791667 0.67875648 0.63736264 0.69473684 0.67708333 0.71122995
0.67015707 0.70967742 0.70526316 0.67357513]
mean value: 0.6855758677521984
MCC on Blind test: 0.71
Accuracy on Blind test: 0.85
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01090002 0.01093769 0.0109024 0.01084542 0.010324 0.01106477
0.01059437 0.01046681 0.00997615 0.01118135]
mean value: 0.01071929931640625
key: score_time
value: [0.0093627 0.0100379 0.00968313 0.00984097 0.01014543 0.0097847
0.00955224 0.00931573 0.00902176 0.00988913]
mean value: 0.00966336727142334
key: test_mcc
value: [0.52704628 0.68421053 0.68421053 0.79388419 0.78947368 0.52704628
0.78947368 0.63960215 0.48078072 0.57857577]
mean value: 0.6494303793257087
key: train_mcc
value: [0.74738216 0.73530684 0.77092175 0.74199852 0.72941176 0.74163853
0.72354193 0.77092175 0.75366357 0.73705515]
mean value: 0.745184196259884
key: test_accuracy
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
0.89473684 0.81578947 0.72972973 0.78378378]
mean value: 0.8224039829302987
key: train_accuracy
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
0.86176471 0.88529412 0.87683284 0.86803519]
mean value: 0.8724279799896498
key: test_fscore
value: [0.76923077 0.84210526 0.84210526 0.9 0.89473684 0.76923077
0.89473684 0.82926829 0.77272727 0.75 ]
mean value: 0.8264141314398054
key: train_fscore
value: [0.87536232 0.86803519 0.88695652 0.87356322 0.86470588 0.87283237
0.86135693 0.88695652 0.87647059 0.87179487]
mean value: 0.8738034415804177
key: test_precision
value: [0.75 0.84210526 0.84210526 0.85714286 0.89473684 0.75
0.89473684 0.77272727 0.68 0.85714286]
mean value: 0.8140697197539303
key: train_precision
value: [0.86285714 0.86549708 0.87428571 0.85393258 0.86470588 0.85795455
0.86390533 0.87428571 0.87647059 0.85 ]
mean value: 0.8643894573208194
key: test_recall
value: [0.78947368 0.84210526 0.84210526 0.94736842 0.89473684 0.78947368
0.89473684 0.89473684 0.89473684 0.66666667]
mean value: 0.8456140350877193
key: train_recall
value: [0.88823529 0.87058824 0.9 0.89411765 0.86470588 0.88823529
0.85882353 0.9 0.87647059 0.89473684]
mean value: 0.8835913312693499
key: test_roc_auc
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
0.89473684 0.81578947 0.7251462 0.78070175]
mean value: 0.8216374269005848
key: train_roc_auc
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
0.86176471 0.88529412 0.87683179 0.86795666]
mean value: 0.8724200206398349
key: test_jcc
value: [0.625 0.72727273 0.72727273 0.81818182 0.80952381 0.625
0.80952381 0.70833333 0.62962963 0.6 ]
mean value: 0.7079737854737855
key: train_jcc
value: [0.77835052 0.76683938 0.796875 0.7755102 0.76165803 0.77435897
0.75647668 0.796875 0.78010471 0.77272727]
mean value: 0.7759775771937931
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.010535 0.01050043 0.01047206 0.01034617 0.01016498 0.01028228
0.01025391 0.01106501 0.01017475 0.00908756]
mean value: 0.010288214683532715
key: score_time
value: [0.01397395 0.01574492 0.01688623 0.01823497 0.0147984 0.01664305
0.01726961 0.01742172 0.01480031 0.01511502]
mean value: 0.01608881950378418
key: test_mcc
value: [0.57894737 0.57894737 0.31622777 0.57894737 0.43643578 0.42163702
0.42640143 0.21821789 0.40780312 0.75614764]
mean value: 0.4719712759930457
key: train_mcc
value: [0.65304287 0.6882472 0.68254191 0.65322377 0.69416569 0.69455037
0.65886913 0.67657595 0.73636217 0.607149 ]
mean value: 0.6744728059621842
key: test_accuracy
value: [0.78947368 0.78947368 0.65789474 0.78947368 0.71052632 0.71052632
0.71052632 0.60526316 0.7027027 0.86486486]
mean value: 0.7330725462304409
key: train_accuracy
value: [0.82647059 0.84411765 0.84117647 0.82647059 0.84705882 0.84705882
0.82941176 0.83823529 0.86803519 0.80351906]
mean value: 0.8371554252199414
key: test_fscore
value: [0.78947368 0.78947368 0.64864865 0.78947368 0.74418605 0.71794872
0.68571429 0.54545455 0.73170732 0.83870968]
mean value: 0.7280790291401931
key: train_fscore
value: [0.82798834 0.84457478 0.84302326 0.8238806 0.84615385 0.84971098
0.83040936 0.83965015 0.86567164 0.80235988]
mean value: 0.8373422826187441
key: test_precision
value: [0.78947368 0.78947368 0.66666667 0.78947368 0.66666667 0.7
0.75 0.64285714 0.68181818 1. ]
mean value: 0.7476429710640237
key: train_precision
value: [0.82080925 0.84210526 0.83333333 0.83636364 0.85119048 0.83522727
0.8255814 0.83236994 0.87878788 0.80952381]
mean value: 0.8365292256184584
key: test_recall
value: [0.78947368 0.78947368 0.63157895 0.78947368 0.84210526 0.73684211
0.63157895 0.47368421 0.78947368 0.72222222]
mean value: 0.7195906432748538
key: train_recall
value: [0.83529412 0.84705882 0.85294118 0.81176471 0.84117647 0.86470588
0.83529412 0.84705882 0.85294118 0.79532164]
mean value: 0.8383556931544548
key: test_roc_auc
value: [0.78947368 0.78947368 0.65789474 0.78947368 0.71052632 0.71052632
0.71052632 0.60526316 0.7002924 0.86111111]
mean value: 0.7324561403508772
key: train_roc_auc
value: [0.82647059 0.84411765 0.84117647 0.82647059 0.84705882 0.84705882
0.82941176 0.83823529 0.86799106 0.80354317]
mean value: 0.8371534227726178
key: test_jcc
value: [0.65217391 0.65217391 0.48 0.65217391 0.59259259 0.56
0.52173913 0.375 0.57692308 0.72222222]
mean value: 0.578499876130311
key: train_jcc
value: [0.70646766 0.73096447 0.72864322 0.70050761 0.73333333 0.73869347
0.71 0.72361809 0.76315789 0.66995074]
mean value: 0.7205336483765594
MCC on Blind test: 0.49
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01583338 0.0182128 0.0161047 0.01597238 0.0159862 0.01671791
0.01628447 0.01583791 0.01580572 0.01597381]
mean value: 0.016272926330566408
key: score_time
value: [0.01147938 0.01161909 0.01118112 0.01065826 0.01088691 0.01061678
0.01073647 0.01067019 0.01060247 0.01049829]
mean value: 0.01089489459991455
key: test_mcc
value: [0.89473684 0.9486833 0.78947368 0.78947368 0.78947368 0.58218174
0.89473684 0.73786479 0.51793973 0.78764146]
mean value: 0.773220574982312
key: train_mcc
value: [0.80022155 0.79424133 0.80600787 0.80600787 0.80600787 0.81182089
0.78828985 0.8 0.82410816 0.80678035]
mean value: 0.8043485716732236
key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.89473684 0.89473684 0.78947368
0.94736842 0.86842105 0.75675676 0.89189189]
mean value: 0.8859174964438122
key: train_accuracy
value: [0.9 0.89705882 0.90294118 0.90294118 0.90294118 0.90588235
0.89411765 0.9 0.91202346 0.90322581]
mean value: 0.9021131619803346
key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.89473684 0.89473684 0.77777778
0.94736842 0.87179487 0.7804878 0.88235294]
mean value: 0.8865719738407196
key: train_fscore
value: [0.90116279 0.89795918 0.90379009 0.90379009 0.90379009 0.90643275
0.89473684 0.9 0.9122807 0.90489914]
mean value: 0.9028841664606161
key: test_precision
value: [0.94736842 0.95 0.89473684 0.89473684 0.89473684 0.82352941
0.94736842 0.85 0.72727273 0.9375 ]
mean value: 0.8867249507458486
key: train_precision
value: [0.8908046 0.89017341 0.89595376 0.89595376 0.89595376 0.90116279
0.88953488 0.9 0.90697674 0.89204545]
mean value: 0.8958559152932181
key: test_recall
value: [0.94736842 1. 0.89473684 0.89473684 0.89473684 0.73684211
0.94736842 0.89473684 0.84210526 0.83333333]
mean value: 0.8885964912280702
key: train_recall
value: [0.91176471 0.90588235 0.91176471 0.91176471 0.91176471 0.91176471
0.9 0.9 0.91764706 0.91812865]
mean value: 0.9100481596147231
key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.89473684 0.89473684 0.78947368
0.94736842 0.86842105 0.75438596 0.89035088]
mean value: 0.8855263157894737
key: train_roc_auc
value: [0.9 0.89705882 0.90294118 0.90294118 0.90294118 0.90588235
0.89411765 0.9 0.9120399 0.90318197]
mean value: 0.902110423116615
key: test_jcc
value: [0.9 0.95 0.80952381 0.80952381 0.80952381 0.63636364
0.9 0.77272727 0.64 0.78947368]
mean value: 0.8017136021872864
key: train_jcc
value: [0.82010582 0.81481481 0.82446809 0.82446809 0.82446809 0.82887701
0.80952381 0.81818182 0.83870968 0.82631579]
mean value: 0.8229932990186044
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.32342815 1.41316032 1.3804853 1.2588563 1.50634933 1.53654742
1.28589988 1.45925713 1.40028787 1.36803317]
mean value: 1.3932304859161377
key: score_time
value: [0.01891804 0.01499844 0.01266909 0.01278591 0.01561332 0.01240611
0.01493168 0.01475978 0.02181697 0.01519823]
mean value: 0.015409755706787109
key: test_mcc
value: [0.89473684 0.89973541 0.78947368 0.89973541 0.73786479 0.63245553
0.84327404 0.73786479 0.56725146 0.78764146]
mean value: 0.7790033421492283
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 0.89473684 0.94736842 0.86842105 0.81578947
0.92105263 0.86842105 0.78378378 0.89189189]
mean value: 0.8886201991465149
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.95 0.89473684 0.94444444 0.87179487 0.81081081
0.91891892 0.87179487 0.78947368 0.88235294]
mean value: 0.8881695806308809
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.9047619 0.89473684 1. 0.85 0.83333333
0.94444444 0.85 0.78947368 0.9375 ]
mean value: 0.8951618629908104
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 1. 0.89473684 0.89473684 0.89473684 0.78947368
0.89473684 0.89473684 0.78947368 0.83333333]
mean value: 0.8833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.94736842 0.89473684 0.94736842 0.86842105 0.81578947
0.92105263 0.86842105 0.78362573 0.89035088]
mean value: 0.8884502923976608
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.9047619 0.80952381 0.89473684 0.77272727 0.68181818
0.85 0.77272727 0.65217391 0.78947368]
mean value: 0.8027942880917709
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02541423 0.01783824 0.01625419 0.01690006 0.01592183 0.01577139
0.01574349 0.01575923 0.01588821 0.01514101]
mean value: 0.017063188552856445
key: score_time
value: [0.01225829 0.00914264 0.00886273 0.00880837 0.00879955 0.00871849
0.00876212 0.0086875 0.00877857 0.00877881]
mean value: 0.009159708023071289
key: test_mcc
value: [1. 0.84327404 0.79388419 0.9486833 0.84327404 0.89973541
0.85280287 0.89973541 0.68035483 0.89181287]
mean value: 0.8653556953757348
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.92105263 0.89473684 0.97368421 0.92105263 0.94736842
0.92105263 0.94736842 0.83783784 0.94594595]
mean value: 0.9310099573257468
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.92307692 0.88888889 0.97435897 0.92307692 0.94444444
0.92682927 0.95 0.83333333 0.94444444]
mean value: 0.9308453199916614
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9 0.94117647 0.95 0.9 1.
0.86363636 0.9047619 0.88235294 0.94444444]
mean value: 0.9286372124607418
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94736842 0.84210526 1. 0.94736842 0.89473684
1. 1. 0.78947368 0.94444444]
mean value: 0.9365497076023391
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92105263 0.89473684 0.97368421 0.92105263 0.94736842
0.92105263 0.94736842 0.83918129 0.94590643]
mean value: 0.9311403508771929
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.85714286 0.8 0.95 0.85714286 0.89473684
0.86363636 0.9047619 0.71428571 0.89473684]
mean value: 0.8736443381180223
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10692024 0.10610008 0.10551596 0.10557318 0.1064024 0.10624623
0.10674787 0.10648775 0.10734653 0.10737967]
mean value: 0.10647199153900147
key: score_time
value: [0.01746225 0.01743841 0.01741266 0.01759815 0.01770043 0.01763558
0.01764989 0.0177474 0.01797533 0.01771569]
mean value: 0.017633581161499025
key: test_mcc
value: [0.9486833 0.89473684 0.63960215 0.79388419 0.73786479 0.73786479
0.84327404 0.79388419 0.56934383 0.83871328]
mean value: 0.7797851390935022
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.81578947 0.89473684 0.86842105 0.86842105
0.92105263 0.89473684 0.78378378 0.91891892]
mean value: 0.8886913229018493
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.94736842 0.8 0.9 0.86486486 0.87179487
0.92307692 0.9 0.8 0.91428571]
mean value: 0.889574976943398
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94736842 0.875 0.85714286 0.88888889 0.85
0.9 0.85714286 0.76190476 0.94117647]
mean value: 0.8828624256720232
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94736842 0.73684211 0.94736842 0.84210526 0.89473684
0.94736842 0.94736842 0.84210526 0.88888889]
mean value: 0.8994152046783626
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.81578947 0.89473684 0.86842105 0.86842105
0.92105263 0.89473684 0.78216374 0.91812865]
mean value: 0.8884502923976607
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.9 0.66666667 0.81818182 0.76190476 0.77272727
0.85714286 0.81818182 0.66666667 0.84210526]
mean value: 0.8053577124629756
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00968361 0.00963068 0.00972104 0.00968266 0.00966477 0.00967169
0.00958872 0.00959301 0.0098207 0.00972056]
mean value: 0.009677743911743164
key: score_time
value: [0.00886869 0.0086751 0.00869846 0.00869846 0.00874829 0.00864029
0.00864649 0.00878 0.00873041 0.00866818]
mean value: 0.008715438842773437
key: test_mcc
value: [0.37047929 0.78947368 0.42640143 0.68803296 0.47633051 0.58218174
0.42640143 0.47633051 0.29618896 0.62280702]
mean value: 0.5154627531777716
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.68421053 0.89473684 0.71052632 0.84210526 0.73684211 0.78947368
0.71052632 0.73684211 0.64864865 0.81081081]
mean value: 0.7564722617354196
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.89473684 0.68571429 0.83333333 0.72222222 0.8
0.68571429 0.75 0.66666667 0.81081081]
mean value: 0.7515865113233534
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.70588235 0.89473684 0.75 0.88235294 0.76470588 0.76190476
0.75 0.71428571 0.65 0.78947368]
mean value: 0.7663342178976854
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.63157895 0.89473684 0.63157895 0.78947368 0.68421053 0.84210526
0.63157895 0.78947368 0.68421053 0.83333333]
mean value: 0.7412280701754386
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.68421053 0.89473684 0.71052632 0.84210526 0.73684211 0.78947368
0.71052632 0.73684211 0.64766082 0.81140351]
mean value: 0.7564327485380117
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.80952381 0.52173913 0.71428571 0.56521739 0.66666667
0.52173913 0.6 0.5 0.68181818]
mean value: 0.6080990024468285
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.71
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.52656865 1.51507688 1.49541473 1.58690763 1.57518911 1.50592113
1.52353764 1.52296948 1.51906967 1.57698035]
mean value: 1.5347635269165039
key: score_time
value: [0.09216976 0.09152102 0.09123611 0.09885144 0.09371042 0.09946156
0.09564042 0.09658599 0.09777331 0.09974384]
mean value: 0.09566938877105713
key: test_mcc
value: [1. 0.89973541 0.78947368 0.89473684 0.84327404 0.84327404
1. 0.9486833 0.78362573 0.89181287]
mean value: 0.8894615917123104
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94736842 0.89473684 0.94736842 0.92105263 0.92105263
1. 0.97368421 0.89189189 0.94594595]
mean value: 0.9443100995732574
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94444444 0.89473684 0.94736842 0.92307692 0.91891892
1. 0.97435897 0.89473684 0.94444444]
mean value: 0.9442085810506863
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.89473684 0.94736842 0.9 0.94444444
1. 0.95 0.89473684 0.94444444]
mean value: 0.9475730994152046
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.89473684 0.89473684 0.94736842 0.94736842 0.89473684
1. 1. 0.89473684 0.94444444]
mean value: 0.941812865497076
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94736842 0.89473684 0.94736842 0.92105263 0.92105263
1. 0.97368421 0.89181287 0.94590643]
mean value: 0.9442982456140351
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89473684 0.80952381 0.9 0.85714286 0.85
1. 0.95 0.80952381 0.89473684]
mean value: 0.8965664160401002
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.948488 0.93976998 0.88425088 0.91976881 0.97527862 0.93516684
0.89031529 0.95844722 0.9312284 0.9260869 ]
mean value: 0.930880093574524
key: score_time
value: [0.17965937 0.26049066 0.24061608 0.1728282 0.16420078 0.18590951
0.26020312 0.22205067 0.26080751 0.14662576]
mean value: 0.20933916568756103
key: test_mcc
value: [1. 0.89973541 0.73786479 0.89473684 0.84327404 0.89973541
0.89973541 0.89973541 0.73020842 0.89181287]
mean value: 0.8696838599499482
key: train_mcc
value: [0.95300713 0.94720632 0.95884012 0.95294118 0.95897286 0.95884012
0.95300713 0.96477265 0.97653939 0.95896113]
mean value: 0.9583088027940004
key: test_accuracy
value: [1. 0.94736842 0.86842105 0.94736842 0.92105263 0.94736842
0.94736842 0.94736842 0.86486486 0.94594595]
mean value: 0.9337126600284494
key: train_accuracy
value: [0.97647059 0.97352941 0.97941176 0.97647059 0.97941176 0.97941176
0.97647059 0.98235294 0.98826979 0.97947214]
mean value: 0.9791271347248577
key: test_fscore
value: [1. 0.94444444 0.86486486 0.94736842 0.92307692 0.94444444
0.94444444 0.95 0.87179487 0.94444444]
mean value: 0.933488285856707
key: train_fscore
value: [0.97633136 0.97329377 0.97935103 0.97647059 0.97922849 0.97947214
0.97633136 0.98224852 0.98823529 0.97947214]
mean value: 0.9790434694122674
key: test_precision
value: [1. 1. 0.88888889 0.94736842 0.9 1.
1. 0.9047619 0.85 0.94444444]
mean value: 0.9435463659147869
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: train_precision
value: [0.98214286 0.98203593 0.98224852 0.97647059 0.98802395 0.97660819
0.98214286 0.98809524 0.98823529 0.98235294]
mean value: 0.9828356363994447
key: test_recall
value: [1. 0.89473684 0.84210526 0.94736842 0.94736842 0.89473684
0.89473684 1. 0.89473684 0.94444444]
mean value: 0.9260233918128655
key: train_recall
value: [0.97058824 0.96470588 0.97647059 0.97647059 0.97058824 0.98235294
0.97058824 0.97647059 0.98823529 0.97660819]
mean value: 0.9753078775369797
key: test_roc_auc
value: [1. 0.94736842 0.86842105 0.94736842 0.92105263 0.94736842
0.94736842 0.94736842 0.86403509 0.94590643]
mean value: 0.933625730994152
key: train_roc_auc
value: [0.97647059 0.97352941 0.97941176 0.97647059 0.97941176 0.97941176
0.97647059 0.98235294 0.98826969 0.97948056]
mean value: 0.9791279669762643
key: test_jcc
value: [1. 0.89473684 0.76190476 0.9 0.85714286 0.89473684
0.89473684 0.9047619 0.77272727 0.89473684]
mean value: 0.877548416495785
key: train_jcc
value: [0.95375723 0.94797688 0.95953757 0.95402299 0.95930233 0.95977011
0.95375723 0.96511628 0.97674419 0.95977011]
mean value: 0.9589754910822583
MCC on Blind test: 0.85
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02467036 0.00961518 0.00966334 0.00970984 0.00982165 0.00965047
0.00962496 0.00963306 0.00963211 0.00960302]
mean value: 0.011162400245666504
key: score_time
value: [0.01448631 0.00879812 0.00893068 0.0087347 0.0088098 0.0087955
0.00874925 0.00880694 0.00874519 0.00879836]
mean value: 0.009365487098693847
key: test_mcc
value: [0.52704628 0.68421053 0.68421053 0.79388419 0.78947368 0.52704628
0.78947368 0.63960215 0.48078072 0.57857577]
mean value: 0.6494303793257087
key: train_mcc
value: [0.74738216 0.73530684 0.77092175 0.74199852 0.72941176 0.74163853
0.72354193 0.77092175 0.75366357 0.73705515]
mean value: 0.745184196259884
key: test_accuracy
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
0.89473684 0.81578947 0.72972973 0.78378378]
mean value: 0.8224039829302987
key: train_accuracy
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
0.86176471 0.88529412 0.87683284 0.86803519]
mean value: 0.8724279799896498
key: test_fscore
value: [0.76923077 0.84210526 0.84210526 0.9 0.89473684 0.76923077
0.89473684 0.82926829 0.77272727 0.75 ]
mean value: 0.8264141314398054
key: train_fscore
value: [0.87536232 0.86803519 0.88695652 0.87356322 0.86470588 0.87283237
0.86135693 0.88695652 0.87647059 0.87179487]
mean value: 0.8738034415804177
key: test_precision
value: [0.75 0.84210526 0.84210526 0.85714286 0.89473684 0.75
0.89473684 0.77272727 0.68 0.85714286]
mean value: 0.8140697197539303
key: train_precision
value: [0.86285714 0.86549708 0.87428571 0.85393258 0.86470588 0.85795455
0.86390533 0.87428571 0.87647059 0.85 ]
mean value: 0.8643894573208194
key: test_recall
value: [0.78947368 0.84210526 0.84210526 0.94736842 0.89473684 0.78947368
0.89473684 0.89473684 0.89473684 0.66666667]
mean value: 0.8456140350877193
key: train_recall
value: [0.88823529 0.87058824 0.9 0.89411765 0.86470588 0.88823529
0.85882353 0.9 0.87647059 0.89473684]
mean value: 0.8835913312693499
key: test_roc_auc
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
0.89473684 0.81578947 0.7251462 0.78070175]
mean value: 0.8216374269005848
key: train_roc_auc
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
0.86176471 0.88529412 0.87683179 0.86795666]
mean value: 0.8724200206398349
key: test_jcc
value: [0.625 0.72727273 0.72727273 0.81818182 0.80952381 0.625
0.80952381 0.70833333 0.62962963 0.6 ]
mean value: 0.7079737854737855
key: train_jcc
value: [0.77835052 0.76683938 0.796875 0.7755102 0.76165803 0.77435897
0.75647668 0.796875 0.78010471 0.77272727]
mean value: 0.7759775771937931
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.2068522 0.1708672 0.06113815 0.06603909 0.06120896 0.05981898
0.05868864 0.29174948 0.05496788 0.05988955]
mean value: 0.10912201404571534
key: score_time
value: [0.01325941 0.01139426 0.01159883 0.01139307 0.01082778 0.01058769
0.01076388 0.01153779 0.01074386 0.01062775]
mean value: 0.011273431777954101
key: test_mcc
value: [1. 1. 1. 0.9486833 0.89473684 0.9486833
0.84327404 0.9486833 0.78362573 0.89181287]
mean value: 0.92594993754596
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 0.97368421 0.94736842 0.97368421
0.92105263 0.97368421 0.89189189 0.94594595]
mean value: 0.9627311522048364
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 0.97435897 0.94736842 0.97297297
0.92307692 0.97435897 0.89473684 0.94444444]
mean value: 0.9631317552370184
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0.95 0.94736842 1.
0.9 0.95 0.89473684 0.94444444]
mean value: 0.9586549707602339
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.94736842 0.94736842
0.94736842 1. 0.89473684 0.94444444]
mean value: 0.9681286549707602
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 0.97368421 0.94736842 0.97368421
0.92105263 0.97368421 0.89181287 0.94590643]
mean value: 0.962719298245614
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 0.95 0.9 0.94736842
0.85714286 0.95 0.80952381 0.89473684]
mean value: 0.9308771929824561
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04102659 0.08542848 0.07878828 0.07048774 0.07999039 0.06906962
0.06940484 0.05672765 0.0792861 0.05792046]
mean value: 0.06881301403045655
key: score_time
value: [0.0229919 0.02294731 0.02151322 0.01808548 0.02362418 0.02200913
0.01912975 0.01265907 0.01236296 0.01703215]
mean value: 0.019235515594482423
key: test_mcc
value: [0.80757285 0.9486833 0.79388419 0.84327404 0.73786479 0.48454371
0.73786479 0.73786479 0.51461988 0.78362573]
mean value: 0.7389798067892077
key: train_mcc
value: [0.92941176 0.94124161 0.95294118 0.92941176 0.95294118 0.94707521
0.94124161 0.9353103 0.94722901 0.95896113]
mean value: 0.9435764748091761
key: test_accuracy
value: [0.89473684 0.97368421 0.89473684 0.92105263 0.86842105 0.73684211
0.86842105 0.86842105 0.75675676 0.89189189]
mean value: 0.8674964438122333
key: train_accuracy
value: [0.96470588 0.97058824 0.97647059 0.96470588 0.97647059 0.97352941
0.97058824 0.96764706 0.97360704 0.97947214]
mean value: 0.9717785061238572
key: test_fscore
value: [0.9047619 0.97435897 0.88888889 0.92307692 0.86486486 0.70588235
0.87179487 0.87179487 0.75675676 0.88888889]
mean value: 0.8651069298128121
key: train_fscore
value: [0.96470588 0.9704142 0.97647059 0.96470588 0.97647059 0.97360704
0.9704142 0.96755162 0.97345133 0.97947214]
mean value: 0.9717263472281472
key: test_precision
value: [0.82608696 0.95 0.94117647 0.9 0.88888889 0.8
0.85 0.85 0.77777778 0.88888889]
mean value: 0.867281898266553
key: train_precision
value: [0.96470588 0.97619048 0.97647059 0.96470588 0.97647059 0.97076023
0.97619048 0.9704142 0.97633136 0.98235294]
mean value: 0.97345926307822
key: test_recall
value: [1. 1. 0.84210526 0.94736842 0.84210526 0.63157895
0.89473684 0.89473684 0.73684211 0.88888889]
mean value: 0.8678362573099415
key: train_recall
value: [0.96470588 0.96470588 0.97647059 0.96470588 0.97647059 0.97647059
0.96470588 0.96470588 0.97058824 0.97660819]
mean value: 0.9700137598899209
key: test_roc_auc
value: [0.89473684 0.97368421 0.89473684 0.92105263 0.86842105 0.73684211
0.86842105 0.86842105 0.75730994 0.89181287]
mean value: 0.8675438596491228
key: train_roc_auc
value: [0.96470588 0.97058824 0.97647059 0.96470588 0.97647059 0.97352941
0.97058824 0.96764706 0.97359821 0.97948056]
mean value: 0.9717784657722739
key: test_jcc
value: [0.82608696 0.95 0.8 0.85714286 0.76190476 0.54545455
0.77272727 0.77272727 0.60869565 0.8 ]
mean value: 0.7694739318652362
key: train_jcc
value: [0.93181818 0.94252874 0.95402299 0.93181818 0.95402299 0.94857143
0.94252874 0.93714286 0.94827586 0.95977011]
mean value: 0.9450500074638005
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01010537 0.01241589 0.01016307 0.00971317 0.01059246 0.01026273
0.0100174 0.00931168 0.00948501 0.00963616]
mean value: 0.010170292854309083
key: score_time
value: [0.00914979 0.00916171 0.00918078 0.00903225 0.00939584 0.00868344
0.009166 0.00891471 0.00897503 0.00866079]
mean value: 0.009032034873962402
key: test_mcc
value: [0.59222009 0.63960215 0.63245553 0.63960215 0.68421053 0.63245553
0.59222009 0.73786479 0.4670794 0.69007214]
mean value: 0.6307782394614956
key: train_mcc
value: [0.66563935 0.61817134 0.70059418 0.65349541 0.66610178 0.72946225
0.60715472 0.75894171 0.71355814 0.67276567]
mean value: 0.6785884546131096
key: test_accuracy
value: [0.78947368 0.81578947 0.81578947 0.81578947 0.84210526 0.81578947
0.78947368 0.86842105 0.72972973 0.83783784]
mean value: 0.8120199146514936
key: train_accuracy
value: [0.83235294 0.80882353 0.85 0.82647059 0.83235294 0.86470588
0.80294118 0.87941176 0.85630499 0.83577713]
mean value: 0.8389140934966361
key: test_fscore
value: [0.76470588 0.8 0.81081081 0.8 0.84210526 0.81081081
0.76470588 0.87179487 0.76190476 0.8125 ]
mean value: 0.8039338283185032
key: train_fscore
value: [0.82779456 0.8048048 0.84684685 0.82282282 0.82674772 0.86390533
0.79635258 0.87833828 0.85196375 0.8313253 ]
mean value: 0.8350901992163299
key: test_precision
value: [0.86666667 0.875 0.83333333 0.875 0.84210526 0.83333333
0.86666667 0.85 0.69565217 0.92857143]
mean value: 0.8466328865642367
key: train_precision
value: [0.85093168 0.82208589 0.86503067 0.8404908 0.85534591 0.86904762
0.82389937 0.88622754 0.8757764 0.85714286]
mean value: 0.8545978740616875
key: test_recall
value: [0.68421053 0.73684211 0.78947368 0.73684211 0.84210526 0.78947368
0.68421053 0.89473684 0.84210526 0.72222222]
mean value: 0.7722222222222223
key: train_recall
value: [0.80588235 0.78823529 0.82941176 0.80588235 0.8 0.85882353
0.77058824 0.87058824 0.82941176 0.80701754]
mean value: 0.8165841073271414
key: test_roc_auc
value: [0.78947368 0.81578947 0.81578947 0.81578947 0.84210526 0.81578947
0.78947368 0.86842105 0.72660819 0.83479532]
mean value: 0.8114035087719298
key: train_roc_auc
value: [0.83235294 0.80882353 0.85 0.82647059 0.83235294 0.86470588
0.80294118 0.87941176 0.85622635 0.83586171]
mean value: 0.8389146886824905
key: test_jcc
value: [0.61904762 0.66666667 0.68181818 0.66666667 0.72727273 0.68181818
0.61904762 0.77272727 0.61538462 0.68421053]
mean value: 0.673466007676534
key: train_jcc
value: [0.70618557 0.67336683 0.734375 0.69897959 0.70466321 0.76041667
0.66161616 0.78306878 0.74210526 0.71134021]
mean value: 0.7176117286148205
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01501966 0.02042222 0.01662898 0.02123141 0.01934409 0.0200212
0.01753616 0.01804042 0.01837301 0.01642108]
mean value: 0.018303823471069337
key: score_time
value: [0.0092628 0.0112505 0.01110458 0.01176238 0.01192117 0.01195502
0.01184368 0.01198912 0.01202178 0.01190639]
mean value: 0.011501741409301759
key: test_mcc
value: [0.9486833 0.89973541 0.38729833 0.84327404 0.67936622 0.56613852
0.29277002 0.79388419 0.56934383 0.62525715]
mean value: 0.6605751008798381
key: train_mcc
value: [0.87660709 0.8617507 0.57735027 0.91190671 0.81150267 0.73854895
0.42491829 0.83159022 0.93562485 0.84815135]
mean value: 0.7817951100491364
key: test_accuracy
value: [0.97368421 0.94736842 0.65789474 0.92105263 0.81578947 0.76315789
0.57894737 0.89473684 0.78378378 0.78378378]
mean value: 0.8120199146514936
key: train_accuracy
value: [0.93823529 0.92941176 0.75 0.95588235 0.89705882 0.85294118
0.65294118 0.90882353 0.96774194 0.92082111]
mean value: 0.8773857167500432
key: test_fscore
value: [0.97435897 0.95 0.73469388 0.92307692 0.84444444 0.70967742
0.7037037 0.9 0.8 0.71428571]
mean value: 0.825424105677562
key: train_fscore
value: [0.93877551 0.93220339 0.8 0.95626822 0.90666667 0.82758621
0.74235808 0.89967638 0.96735905 0.91588785]
mean value: 0.8886781350091697
key: test_precision
value: [0.95 0.9047619 0.6 0.9 0.73076923 0.91666667
0.54285714 0.85714286 0.76190476 1. ]
mean value: 0.8164102564102564
key: train_precision
value: [0.93063584 0.89673913 0.66666667 0.94797688 0.82926829 1.
0.59027778 1. 0.9760479 0.98 ]
mean value: 0.8817612488516776
key: test_recall
value: [1. 1. 0.94736842 0.94736842 1. 0.57894737
1. 0.94736842 0.84210526 0.55555556]
mean value: 0.8818713450292397
key: train_recall
value: [0.94705882 0.97058824 1. 0.96470588 1. 0.70588235
1. 0.81764706 0.95882353 0.85964912]
mean value: 0.9224355005159959
key: test_roc_auc
value: [0.97368421 0.94736842 0.65789474 0.92105263 0.81578947 0.76315789
0.57894737 0.89473684 0.78216374 0.77777778]
mean value: 0.8112573099415205
key: train_roc_auc
value: [0.93823529 0.92941176 0.75 0.95588235 0.89705882 0.85294118
0.65294118 0.90882353 0.96771586 0.92100103]
mean value: 0.8774011007911937
key: test_jcc
value: [0.95 0.9047619 0.58064516 0.85714286 0.73076923 0.55
0.54285714 0.81818182 0.66666667 0.55555556]
mean value: 0.7156580337225499
key: train_jcc
value: [0.88461538 0.87301587 0.66666667 0.91620112 0.82926829 0.70588235
0.59027778 0.81764706 0.93678161 0.84482759]
mean value: 0.8065183719244069
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01763225 0.0185833 0.01747012 0.01765466 0.01662111 0.01753116
0.01757503 0.01850629 0.01743722 0.01835084]
mean value: 0.017736196517944336
key: score_time
value: [0.01220918 0.01224709 0.0121901 0.01216316 0.01189256 0.01195359
0.01198053 0.01190829 0.01229215 0.01211333]
mean value: 0.012094998359680175
key: test_mcc
value: [0.80757285 0.89973541 0.61017022 0.84327404 0.79388419 0.63960215
0.79388419 0.78947368 0.62807634 0.78764146]
mean value: 0.7593314528036907
key: train_mcc
value: [0.8028464 0.8452381 0.67431767 0.90076395 0.87721456 0.90688708
0.91178048 0.82150888 0.87394751 0.91280274]
mean value: 0.8527307362889015
key: test_accuracy
value: [0.89473684 0.94736842 0.78947368 0.92105263 0.89473684 0.81578947
0.89473684 0.89473684 0.81081081 0.89189189]
mean value: 0.8755334281650071
key: train_accuracy
value: [0.89705882 0.91764706 0.81470588 0.95 0.93823529 0.95294118
0.95588235 0.90294118 0.93548387 0.95601173]
mean value: 0.9220907365878903
key: test_fscore
value: [0.9047619 0.94444444 0.81818182 0.92307692 0.88888889 0.8
0.88888889 0.89473684 0.82926829 0.88235294]
mean value: 0.8774600944207529
key: train_fscore
value: [0.90410959 0.91082803 0.84289277 0.94894895 0.93693694 0.95180723
0.95575221 0.89250814 0.93785311 0.95522388]
mean value: 0.9236860841053656
key: test_precision
value: [0.82608696 1. 0.72 0.9 0.94117647 0.875
0.94117647 0.89473684 0.77272727 0.9375 ]
mean value: 0.8808404012530746
key: train_precision
value: [0.84615385 0.99305556 0.73160173 0.96932515 0.95705521 0.97530864
0.95857988 1. 0.90217391 0.97560976]
mean value: 0.9308863694182445
key: test_recall
value: [1. 0.89473684 0.94736842 0.94736842 0.84210526 0.73684211
0.84210526 0.89473684 0.89473684 0.83333333]
mean value: 0.8833333333333333
key: train_recall
value: [0.97058824 0.84117647 0.99411765 0.92941176 0.91764706 0.92941176
0.95294118 0.80588235 0.97647059 0.93567251]
mean value: 0.9253319573443413
key: test_roc_auc
value: [0.89473684 0.94736842 0.78947368 0.92105263 0.89473684 0.81578947
0.89473684 0.89473684 0.80847953 0.89035088]
mean value: 0.8751461988304093
key: train_roc_auc
value: [0.89705882 0.91764706 0.81470588 0.95 0.93823529 0.95294118
0.95588235 0.90294118 0.93560372 0.95607155]
mean value: 0.922108703130375
key: test_jcc
value: [0.82608696 0.89473684 0.69230769 0.85714286 0.8 0.66666667
0.8 0.80952381 0.70833333 0.78947368]
mean value: 0.7844271841811887
key: train_jcc
value: [0.825 0.83625731 0.72844828 0.90285714 0.88135593 0.90804598
0.91525424 0.80588235 0.88297872 0.91428571]
mean value: 0.8600365665794898
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.16849089 0.14875078 0.15005994 0.15339446 0.15740991 0.15145826
0.14945006 0.15879703 0.16036916 0.15662646]
mean value: 0.155480694770813
key: score_time
value: [0.0154891 0.01526761 0.01554847 0.01697516 0.01604891 0.01547098
0.01667619 0.01671481 0.01670885 0.01569343]
mean value: 0.016059350967407227
key: test_mcc
value: [1. 1. 1. 0.9486833 0.9486833 0.9486833
0.89973541 0.89473684 0.73099415 0.89181287]
mean value: 0.9263329164643102
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 0.97368421 0.97368421 0.97368421
0.94736842 0.94736842 0.86486486 0.94594595]
mean value: 0.9626600284495022
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 0.97435897 0.97297297 0.97297297
0.95 0.94736842 0.86486486 0.94444444]
mean value: 0.9626982650666861
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0.95 1. 1.
0.9047619 0.94736842 0.88888889 0.94444444]
mean value: 0.9635463659147869
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.94736842 0.94736842
1. 0.94736842 0.84210526 0.94444444]
mean value: 0.9628654970760233
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 0.97368421 0.97368421 0.97368421
0.94736842 0.94736842 0.86549708 0.94590643]
mean value: 0.9627192982456141
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 0.95 0.94736842 0.94736842
0.9047619 0.9 0.76190476 0.89473684]
mean value: 0.9306140350877192
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05852747 0.04814076 0.04541254 0.04895949 0.05033207 0.04772234
0.06385469 0.04760981 0.04737306 0.05188274]
mean value: 0.0509814977645874
key: score_time
value: [0.02652955 0.02458501 0.02486563 0.02379942 0.02537274 0.02120161
0.02789068 0.02503252 0.02294707 0.02493882]
mean value: 0.02471630573272705
key: test_mcc
value: [1. 1. 1. 0.9486833 0.89473684 0.89973541
0.9486833 0.9486833 0.73099415 0.94736842]
mean value: 0.9318884720198657
key: train_mcc
value: [1. 0.99413485 1. 1. 0.98830369 0.99413485
0.98823529 0.98830369 0.99415185 0.98833809]
mean value: 0.9935602308976983
key: test_accuracy
value: [1. 1. 1. 0.97368421 0.94736842 0.94736842
0.97368421 0.97368421 0.86486486 0.97297297]
mean value: 0.9653627311522048
key: train_accuracy
value: [1. 0.99705882 1. 1. 0.99411765 0.99705882
0.99411765 0.99411765 0.99706745 0.9941349 ]
mean value: 0.996767293427635
key: test_fscore
value: [1. 1. 1. 0.97435897 0.94736842 0.94444444
0.97435897 0.97435897 0.86486486 0.97297297]
mean value: 0.9652727626411837
key: train_fscore
value: [1. 0.99705015 1. 1. 0.99408284 0.99705015
0.99411765 0.99408284 0.99705015 0.99411765]
mean value: 0.9967551417068896
key: test_precision
value: [1. 1. 1. 0.95 0.94736842 1.
0.95 0.95 0.88888889 0.94736842]
mean value: 0.9633625730994152
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.99411765 1. 1. 1. ]
mean value: 0.9994117647058823
key: test_recall
value: [1. 1. 1. 1. 0.94736842 0.89473684
1. 1. 0.84210526 1. ]
mean value: 0.968421052631579
key: train_recall
value: [1. 0.99411765 1. 1. 0.98823529 0.99411765
0.99411765 0.98823529 0.99411765 0.98830409]
mean value: 0.994124527003784
key: test_roc_auc
value: [1. 1. 1. 0.97368421 0.94736842 0.94736842
0.97368421 0.97368421 0.86549708 0.97368421]
mean value: 0.9654970760233919
key: train_roc_auc
value: [1. 0.99705882 1. 1. 0.99411765 0.99705882
0.99411765 0.99411765 0.99705882 0.99415205]
mean value: 0.9967681458548332
key: test_jcc
value: [1. 1. 1. 0.95 0.9 0.89473684
0.95 0.95 0.76190476 0.94736842]
mean value: 0.9354010025062657
key: train_jcc
value: [1. 0.99411765 1. 1. 0.98823529 0.99411765
0.98830409 0.98823529 0.99411765 0.98830409]
mean value: 0.9935431716546268
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.08551073 0.10424566 0.10746527 0.11186576 0.08251929 0.10721231
0.0956862 0.1061132 0.16514015 0.17688584]
mean value: 0.11426444053649902
key: score_time
value: [0.02205372 0.02442288 0.02611351 0.02609515 0.02227306 0.02218318
0.022053 0.02277589 0.03644013 0.04053092]
mean value: 0.026494145393371582
key: test_mcc
value: [0.58218174 0.68421053 0.42640143 0.68803296 0.59222009 0.68803296
0.68803296 0.68803296 0.35558302 0.69007214]
mean value: 0.6082800795061165
key: train_mcc
value: [0.99413485 0.99413485 0.99413485 1. 1. 0.99413485
0.99413485 0.99413485 0.99415185 0.99415205]
mean value: 0.9953112973615207
key: test_accuracy
value: [0.78947368 0.84210526 0.71052632 0.84210526 0.78947368 0.84210526
0.84210526 0.84210526 0.67567568 0.83783784]
mean value: 0.8013513513513513
key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 1. 1. 0.99705882
0.99705882 0.99705882 0.99706745 0.99706745]
mean value: 0.9976487838537175
key: test_fscore
value: [0.77777778 0.84210526 0.68571429 0.83333333 0.80952381 0.83333333
0.85 0.83333333 0.71428571 0.8125 ]
mean value: 0.7991906850459483
key: train_fscore
value: [0.99705015 0.99705015 0.99705015 1. 1. 0.99705015
0.99705015 0.99705015 0.99705015 0.99706745]
mean value: 0.9976418481128729
key: test_precision
value: [0.82352941 0.84210526 0.75 0.88235294 0.73913043 0.88235294
0.80952381 0.88235294 0.65217391 0.92857143]
mean value: 0.8192093084373337
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.73684211 0.84210526 0.63157895 0.78947368 0.89473684 0.78947368
0.89473684 0.78947368 0.78947368 0.72222222]
mean value: 0.7880116959064327
key: train_recall
value: [0.99411765 0.99411765 0.99411765 1. 1. 0.99411765
0.99411765 0.99411765 0.99411765 0.99415205]
mean value: 0.9952975576195391
key: test_roc_auc
value: [0.78947368 0.84210526 0.71052632 0.84210526 0.78947368 0.84210526
0.84210526 0.84210526 0.67251462 0.83479532]
mean value: 0.8007309941520468
key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 1. 1. 0.99705882
0.99705882 0.99705882 0.99705882 0.99707602]
mean value: 0.9976487788097695
key: test_jcc
value: [0.63636364 0.72727273 0.52173913 0.71428571 0.68 0.71428571
0.73913043 0.71428571 0.55555556 0.68421053]
mean value: 0.6687129153582243
key: train_jcc
value: [0.99411765 0.99411765 0.99411765 1. 1. 0.99411765
0.99411765 0.99411765 0.99411765 0.99415205]
mean value: 0.9952975576195391
MCC on Blind test: 0.57
Accuracy on Blind test: 0.78
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.57054996 0.5652597 0.55597973 0.57494354 0.56049299 0.54836679
0.56327939 0.55440116 0.55499816 0.55580854]
mean value: 0.5604079961776733
key: score_time
value: [0.00960946 0.01006937 0.0094645 0.00990939 0.00955105 0.00955939
0.00990939 0.00954437 0.0094986 0.00949216]
mean value: 0.009660768508911132
key: test_mcc
value: [1. 1. 0.9486833 0.9486833 0.89473684 0.89973541
0.89973541 0.9486833 0.78362573 0.89181287]
mean value: 0.9215696154432907
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.97368421 0.97368421 0.94736842 0.94736842
0.94736842 0.97368421 0.89189189 0.94594595]
mean value: 0.960099573257468
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.97435897 0.97435897 0.94736842 0.94444444
0.95 0.97435897 0.89473684 0.94444444]
mean value: 0.9604071075123707
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.95 0.95 0.94736842 1.
0.9047619 0.95 0.89473684 0.94444444]
mean value: 0.9541311612364244
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.94736842 0.89473684
1. 1. 0.89473684 0.94444444]
mean value: 0.9681286549707602
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.97368421 0.97368421 0.94736842 0.94736842
0.94736842 0.97368421 0.89181287 0.94590643]
mean value: 0.9600877192982457
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.95 0.95 0.9 0.89473684
0.9047619 0.95 0.80952381 0.89473684]
mean value: 0.9253759398496241
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02796578 0.02674651 0.02773929 0.02755022 0.02779508 0.02704573
0.0277493 0.03268075 0.02809811 0.02789998]
mean value: 0.028127074241638184
key: score_time
value: [0.01260114 0.0186615 0.01520371 0.01710987 0.01517677 0.01661873
0.01516271 0.02039289 0.01575422 0.01550627]
mean value: 0.01621878147125244
key: test_mcc
value: [0.52704628 0.26462806 0.21320072 0.68803296 0.63245553 0.26462806
0.16151457 0.37686733 0.18980224 0.6754386 ]
mean value: 0.3993614351469908
key: train_mcc
value: [0.94838881 0.95884012 0.70087664 0.9653073 0.88852332 0.86751214
0.86751214 0.79170339 0.87096663 0.95366475]
mean value: 0.8813295248742146
key: test_accuracy
value: [0.76315789 0.63157895 0.60526316 0.84210526 0.81578947 0.63157895
0.57894737 0.68421053 0.59459459 0.83783784]
mean value: 0.6985064011379801
key: train_accuracy
value: [0.97352941 0.97941176 0.82941176 0.98235294 0.94117647 0.92941176
0.92941176 0.88529412 0.93548387 0.97653959]
mean value: 0.9362023460410557
key: test_fscore
value: [0.75675676 0.65 0.57142857 0.83333333 0.81081081 0.61111111
0.52941176 0.64705882 0.65116279 0.83333333]
mean value: 0.6894407295706886
key: train_fscore
value: [0.97280967 0.97935103 0.79432624 0.98203593 0.9375 0.92405063
0.92405063 0.87043189 0.93529412 0.97701149]
mean value: 0.9296861640810983
key: test_precision
value: [0.77777778 0.61904762 0.625 0.88235294 0.83333333 0.64705882
0.6 0.73333333 0.58333333 0.83333333]
mean value: 0.7134570494864613
key: train_precision
value: [1. 0.98224852 1. 1. 1. 1.
1. 1. 0.93529412 0.96045198]
mean value: 0.9877994615758248
key: test_recall
value: [0.73684211 0.68421053 0.52631579 0.78947368 0.78947368 0.57894737
0.47368421 0.57894737 0.73684211 0.83333333]
mean value: 0.6728070175438596
key: train_recall
value: [0.94705882 0.97647059 0.65882353 0.96470588 0.88235294 0.85882353
0.85882353 0.77058824 0.93529412 0.99415205]
mean value: 0.8847093223254214
key: test_roc_auc
value: [0.76315789 0.63157895 0.60526316 0.84210526 0.81578947 0.63157895
0.57894737 0.68421053 0.59064327 0.8377193 ]
mean value: 0.6980994152046783
key: train_roc_auc
value: [0.97352941 0.97941176 0.82941176 0.98235294 0.94117647 0.92941176
0.92941176 0.88529412 0.93548332 0.97648779]
mean value: 0.9361971104231166
key: test_jcc
value: [0.60869565 0.48148148 0.4 0.71428571 0.68181818 0.44
0.36 0.47826087 0.48275862 0.71428571]
mean value: 0.5361586234299878
key: train_jcc
value: [0.94705882 0.95953757 0.65882353 0.96470588 0.88235294 0.85882353
0.85882353 0.77058824 0.87845304 0.95505618]
mean value: 0.8734223261291885
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01517224 0.0148356 0.01910758 0.02715397 0.03688359 0.02013373
0.03677201 0.03099537 0.03356433 0.03359127]
mean value: 0.026820969581604005
key: score_time
value: [0.01220894 0.01221657 0.01221132 0.02378225 0.02123022 0.02143621
0.02273417 0.02497792 0.022928 0.02345252]
mean value: 0.019717812538146973
key: test_mcc
value: [0.89473684 0.9486833 0.78947368 0.84327404 0.84327404 0.58218174
0.84327404 0.79388419 0.56725146 0.78764146]
mean value: 0.7893674798967042
key: train_mcc
value: [0.87064849 0.89411765 0.90588235 0.87684993 0.89417953 0.90014017
0.87660709 0.89417953 0.90043693 0.90030617]
mean value: 0.8913347841716839
key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.78947368
0.92105263 0.89473684 0.78378378 0.89189189]
mean value: 0.8938833570412518
key: train_accuracy
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.95
0.93823529 0.94705882 0.95014663 0.95014663]
mean value: 0.9456175608073141
key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.92307692 0.92307692 0.77777778
0.91891892 0.9 0.78947368 0.88235294]
mean value: 0.8931141405754409
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93567251 0.94705882 0.95294118 0.93913043 0.94736842 0.95043732
0.93877551 0.94736842 0.95043732 0.95043732]
mean value: 0.9459627255064605
key: test_precision
value: [0.94736842 0.95 0.89473684 0.9 0.9 0.82352941
0.94444444 0.85714286 0.78947368 0.9375 ]
mean value: 0.8944195660720429
key: train_precision
value: [0.93023256 0.94705882 0.95294118 0.92571429 0.94186047 0.94219653
0.93063584 0.94186047 0.94219653 0.94767442]
mean value: 0.9402371094425134
key: test_recall
value: [0.94736842 1. 0.89473684 0.94736842 0.94736842 0.73684211
0.89473684 0.94736842 0.78947368 0.83333333]
mean value: 0.893859649122807
key: train_recall
value: [0.94117647 0.94705882 0.95294118 0.95294118 0.95294118 0.95882353
0.94705882 0.95294118 0.95882353 0.95321637]
mean value: 0.9517922256621947
key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.78947368
0.92105263 0.89473684 0.78362573 0.89035088]
mean value: 0.8937134502923977
key: train_roc_auc
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.95
0.93823529 0.94705882 0.950172 0.9501376 ]
mean value: 0.9456191950464397
key: test_jcc
value: [0.9 0.95 0.80952381 0.85714286 0.85714286 0.63636364
0.85 0.81818182 0.65217391 0.78947368]
mean value: 0.8120002575608983
key: train_jcc
value: [0.87912088 0.89944134 0.91011236 0.8852459 0.9 0.90555556
0.88461538 0.9 0.90555556 0.90555556]
mean value: 0.897520253237496
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.1303339 0.28677583 0.31174827 0.25746155 0.25486231 0.268327
0.14300919 0.26858115 0.38862586 0.40423918]
mean value: 0.27139642238616946
key: score_time
value: [0.01270962 0.0216949 0.02374864 0.02220631 0.02106261 0.03258085
0.01293516 0.02181482 0.02210927 0.02351737]
mean value: 0.021437954902648926
key: test_mcc
value: [0.89473684 0.9486833 0.78947368 0.84327404 0.84327404 0.63245553
0.84327404 0.79388419 0.56725146 0.78764146]
mean value: 0.7943948594573259
key: train_mcc
value: [0.87064849 0.89411765 0.90588235 0.87684993 0.89417953 0.9353103
0.87660709 0.92947609 0.90043693 0.90030617]
mean value: 0.898381453059501
key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.81578947
0.92105263 0.89473684 0.78378378 0.89189189]
mean value: 0.8965149359886202
key: train_accuracy
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.96764706
0.93823529 0.96470588 0.95014663 0.95014663]
mean value: 0.94914697257202
key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.92307692 0.92307692 0.81081081
0.91891892 0.9 0.78947368 0.88235294]
mean value: 0.8964174438787442
key: train_fscore
value: [0.93567251 0.94705882 0.95294118 0.93913043 0.94736842 0.96755162
0.93877551 0.96449704 0.95043732 0.95043732]
mean value: 0.9493870180066716
key: test_precision
value: [0.94736842 0.95 0.89473684 0.9 0.9 0.83333333
0.94444444 0.85714286 0.78947368 0.9375 ]
mean value: 0.8953999582289056
key: train_precision
value: [0.93023256 0.94705882 0.95294118 0.92571429 0.94186047 0.9704142
0.93063584 0.9702381 0.94219653 0.94767442]
mean value: 0.9458966393938475
key: test_recall
value: [0.94736842 1. 0.89473684 0.94736842 0.94736842 0.78947368
0.89473684 0.94736842 0.78947368 0.83333333]
mean value: 0.8991228070175439
key: train_recall
value: [0.94117647 0.94705882 0.95294118 0.95294118 0.95294118 0.96470588
0.94705882 0.95882353 0.95882353 0.95321637]
mean value: 0.95296869625043
key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.81578947
0.92105263 0.89473684 0.78362573 0.89035088]
mean value: 0.896345029239766
key: train_roc_auc
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.96764706
0.93823529 0.96470588 0.950172 0.9501376 ]
mean value: 0.9491486068111455
key: test_jcc
value: [0.9 0.95 0.80952381 0.85714286 0.85714286 0.68181818
0.85 0.81818182 0.65217391 0.78947368]
mean value: 0.8165457121063529
key: train_jcc
value: [0.87912088 0.89944134 0.91011236 0.8852459 0.9 0.93714286
0.88461538 0.93142857 0.90555556 0.90555556]
mean value: 0.9038218405390832
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.027807 0.0355041 0.06297231 0.05780268 0.05370164 0.03470182
0.03526163 0.03333521 0.0357399 0.03369379]
mean value: 0.0410520076751709
key: score_time
value: [0.01207423 0.01532435 0.01223469 0.01207042 0.0144453 0.01446915
0.0145359 0.01214051 0.01207852 0.01209593]
mean value: 0.01314690113067627
key: test_mcc
value: [0.89473684 0.9486833 0.73786479 0.84327404 0.73786479 0.48454371
0.89473684 0.79388419 0.62807634 0.78764146]
mean value: 0.77513062978033
key: train_mcc
value: [0.85295593 0.86472084 0.87660709 0.85888297 0.85882353 0.86472084
0.87058824 0.87064849 0.86511868 0.86511404]
mean value: 0.8648180657385829
key: test_accuracy
value: [0.94736842 0.97368421 0.86842105 0.92105263 0.86842105 0.73684211
0.94736842 0.89473684 0.81081081 0.89189189]
mean value: 0.8860597439544808
key: train_accuracy
value: [0.92647059 0.93235294 0.93823529 0.92941176 0.92941176 0.93235294
0.93529412 0.93529412 0.93255132 0.93255132]
mean value: 0.9323926168707952
key: test_fscore
value: [0.94736842 0.97435897 0.86486486 0.92307692 0.86486486 0.70588235
0.94736842 0.9 0.82926829 0.88235294]
mean value: 0.8839406056071464
key: train_fscore
value: [0.92668622 0.93255132 0.93877551 0.92982456 0.92941176 0.93255132
0.93529412 0.93491124 0.93255132 0.93294461]
mean value: 0.9325501978931156
key: test_precision
value: [0.94736842 0.95 0.88888889 0.9 0.88888889 0.8
0.94736842 0.85714286 0.77272727 0.9375 ]
mean value: 0.8889884749753171
key: train_precision
value: [0.92397661 0.92982456 0.93063584 0.9244186 0.92941176 0.92982456
0.93529412 0.94047619 0.92982456 0.93023256]
mean value: 0.9303919366167779
key: test_recall
value: [0.94736842 1. 0.84210526 0.94736842 0.84210526 0.63157895
0.94736842 0.94736842 0.89473684 0.83333333]
mean value: 0.8833333333333333
key: train_recall
value: [0.92941176 0.93529412 0.94705882 0.93529412 0.92941176 0.93529412
0.93529412 0.92941176 0.93529412 0.93567251]
mean value: 0.9347437220502236
key: test_roc_auc
value: [0.94736842 0.97368421 0.86842105 0.92105263 0.86842105 0.73684211
0.94736842 0.89473684 0.80847953 0.89035088]
mean value: 0.8856725146198831
key: train_roc_auc
value: [0.92647059 0.93235294 0.93823529 0.92941176 0.92941176 0.93235294
0.93529412 0.93529412 0.93255934 0.93254214]
mean value: 0.9323925008599931
key: test_jcc
value: [0.9 0.95 0.76190476 0.85714286 0.76190476 0.54545455
0.9 0.81818182 0.70833333 0.78947368]
mean value: 0.7992395762132605
key: train_jcc
value: [0.86338798 0.87362637 0.88461538 0.86885246 0.86813187 0.87362637
0.87845304 0.87777778 0.87362637 0.87431694]
mean value: 0.8736414567127365
MCC on Blind test: 0.83
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.12819099 0.96782017 0.88282394 1.06483197 0.87350464 1.21616435
1.23634672 1.01481581 1.04923725 1.17672253]
mean value: 1.0610458374023437
key: score_time
value: [0.01469541 0.01223922 0.01531339 0.01517987 0.01223707 0.01528811
0.01213527 0.01546454 0.0202651 0.015342 ]
mean value: 0.014815998077392579
key: test_mcc
value: [0.89973541 0.89973541 0.73786479 0.9486833 0.84327404 0.53300179
0.84327404 0.79388419 0.51461988 0.78362573]
mean value: 0.7797698583492705
key: train_mcc
value: [0.97653817 0.89417953 1. 0.976741 0.88235294 0.90588235
0.98236994 0.98250594 0.99415185 0.98833809]
mean value: 0.9583059813818373
key: test_accuracy
value: [0.94736842 0.94736842 0.86842105 0.97368421 0.92105263 0.76315789
0.92105263 0.89473684 0.75675676 0.89189189]
mean value: 0.8885490753911807
key: train_accuracy
value: [0.98823529 0.94705882 1. 0.98823529 0.94117647 0.95294118
0.99117647 0.99117647 0.99706745 0.9941349 ]
mean value: 0.9791202346041056
key: test_fscore
value: [0.95 0.95 0.87179487 0.97297297 0.92307692 0.74285714
0.92307692 0.9 0.75675676 0.88888889]
mean value: 0.887942447942448
key: train_fscore
value: [0.98816568 0.94674556 1. 0.98809524 0.94117647 0.95294118
0.99115044 0.99109792 0.99705015 0.99411765]
mean value: 0.9790540287635601
key: test_precision
value: [0.9047619 0.9047619 0.85 1. 0.9 0.8125
0.9 0.85714286 0.77777778 0.88888889]
mean value: 0.8795833333333334
key: train_precision
value: [0.99404762 0.95238095 1. 1. 0.94117647 0.95294118
0.99408284 1. 1. 1. ]
mean value: 0.9834629058724081
key: test_recall
value: [1. 1. 0.89473684 0.94736842 0.94736842 0.68421053
0.94736842 0.94736842 0.73684211 0.88888889]
mean value: 0.8994152046783626
key: train_recall
value: [0.98235294 0.94117647 1. 0.97647059 0.94117647 0.95294118
0.98823529 0.98235294 0.99411765 0.98830409]
mean value: 0.9747127622979016
key: test_roc_auc
value: [0.94736842 0.94736842 0.86842105 0.97368421 0.92105263 0.76315789
0.92105263 0.89473684 0.75730994 0.89181287]
mean value: 0.8885964912280702
key: train_roc_auc
value: [0.98823529 0.94705882 1. 0.98823529 0.94117647 0.95294118
0.99117647 0.99117647 0.99705882 0.99415205]
mean value: 0.9791210870313037
key: test_jcc
value: [0.9047619 0.9047619 0.77272727 0.94736842 0.85714286 0.59090909
0.85714286 0.81818182 0.60869565 0.8 ]
mean value: 0.806169177885425
key: train_jcc
value: [0.97660819 0.8988764 1. 0.97647059 0.88888889 0.91011236
0.98245614 0.98235294 0.99411765 0.98830409]
mean value: 0.9598187250457052
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01373482 0.01149917 0.01022553 0.00962138 0.00941873 0.01020408
0.00948644 0.00971484 0.0105443 0.00974584]
mean value: 0.010419511795043945
key: score_time
value: [0.01228786 0.00936007 0.00903177 0.00898194 0.00892377 0.00910449
0.00882721 0.0090754 0.00997901 0.00893044]
mean value: 0.009450197219848633
key: test_mcc
value: [0.68803296 0.69989647 0.69989647 0.79388419 0.57894737 0.59222009
0.47633051 0.73786479 0.57184997 0.64287856]
mean value: 0.6481801382126267
key: train_mcc
value: [0.64994387 0.6215412 0.63334622 0.65705784 0.64172131 0.6871247
0.65360504 0.66133552 0.66975134 0.65909576]
mean value: 0.6534522792161372
key: test_accuracy
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.78947368
0.73684211 0.86842105 0.78378378 0.81081081]
mean value: 0.8199857752489331
key: train_accuracy
value: [0.82352941 0.80882353 0.80588235 0.82647059 0.81764706 0.84117647
0.82352941 0.82647059 0.83284457 0.82697947]
mean value: 0.8233353458685527
key: test_fscore
value: [0.83333333 0.82352941 0.82352941 0.88888889 0.78947368 0.76470588
0.72222222 0.87179487 0.77777778 0.77419355]
mean value: 0.806944903249707
key: train_fscore
value: [0.81481481 0.79750779 0.77702703 0.81619938 0.80379747 0.83125
0.81012658 0.8115016 0.82242991 0.81619938]
mean value: 0.8100853938516973
key: test_precision
value: [0.88235294 0.93333333 0.93333333 0.94117647 0.78947368 0.86666667
0.76470588 0.85 0.82352941 0.92307692]
mean value: 0.8707648646503136
key: train_precision
value: [0.85714286 0.84768212 0.91269841 0.86754967 0.86986301 0.88666667
0.87671233 0.88811189 0.87417219 0.87333333]
mean value: 0.8753932473928845
key: test_recall
value: [0.78947368 0.73684211 0.73684211 0.84210526 0.78947368 0.68421053
0.68421053 0.89473684 0.73684211 0.66666667]
mean value: 0.756140350877193
key: train_recall
value: [0.77647059 0.75294118 0.67647059 0.77058824 0.74705882 0.78235294
0.75294118 0.74705882 0.77647059 0.76608187]
mean value: 0.75484348125215
key: test_roc_auc
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.78947368
0.73684211 0.86842105 0.78508772 0.80701754]
mean value: 0.8197368421052632
key: train_roc_auc
value: [0.82352941 0.80882353 0.80588235 0.82647059 0.81764706 0.84117647
0.82352941 0.82647059 0.83267974 0.82715858]
mean value: 0.8233367733058135
key: test_jcc
value: [0.71428571 0.7 0.7 0.8 0.65217391 0.61904762
0.56521739 0.77272727 0.63636364 0.63157895]
mean value: 0.6791394494140489
key: train_jcc
value: [0.6875 0.66321244 0.63535912 0.68947368 0.67195767 0.71122995
0.68085106 0.6827957 0.6984127 0.68947368]
mean value: 0.6810265999325266
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01158476 0.01068711 0.0107584 0.00968242 0.01254201 0.01016307
0.01010203 0.00989175 0.00978208 0.01023674]
mean value: 0.010543036460876464
key: score_time
value: [0.01002502 0.00914407 0.00887299 0.00956154 0.01029301 0.0097959
0.00908971 0.0090239 0.00896478 0.00972867]
mean value: 0.009449958801269531
key: test_mcc
value: [0.47368421 0.68421053 0.68421053 0.79388419 0.78947368 0.47368421
0.78947368 0.63960215 0.48078072 0.62807634]
mean value: 0.6437080232004303
key: train_mcc
value: [0.73561236 0.70588235 0.75314969 0.72986649 0.70593121 0.73561236
0.71769673 0.75314969 0.73607623 0.71966354]
mean value: 0.7292640648226028
key: test_accuracy
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
0.89473684 0.81578947 0.72972973 0.81081081]
mean value: 0.8198435277382645
key: train_accuracy
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
0.85882353 0.87647059 0.86803519 0.85923754]
mean value: 0.8644919786096257
key: test_fscore
value: [0.73684211 0.84210526 0.84210526 0.9 0.89473684 0.73684211
0.89473684 0.82926829 0.77272727 0.78787879]
mean value: 0.8237242774341619
key: train_fscore
value: [0.86956522 0.85294118 0.87790698 0.86705202 0.85380117 0.86956522
0.85964912 0.87790698 0.86725664 0.86363636]
mean value: 0.8659280881065122
key: test_precision
value: [0.73684211 0.84210526 0.84210526 0.85714286 0.89473684 0.73684211
0.89473684 0.77272727 0.68 0.86666667]
mean value: 0.8123905217589428
key: train_precision
value: [0.85714286 0.85294118 0.86781609 0.85227273 0.84883721 0.85714286
0.85465116 0.86781609 0.86982249 0.83977901]
mean value: 0.8568221664762061
key: test_recall
value: [0.73684211 0.84210526 0.84210526 0.94736842 0.89473684 0.73684211
0.89473684 0.89473684 0.89473684 0.72222222]
mean value: 0.8406432748538012
key: train_recall
value: [0.88235294 0.85294118 0.88823529 0.88235294 0.85882353 0.88235294
0.86470588 0.88823529 0.86470588 0.88888889]
mean value: 0.8753594771241829
key: test_roc_auc
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
0.89473684 0.81578947 0.7251462 0.80847953]
mean value: 0.8191520467836257
key: train_roc_auc
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
0.85882353 0.87647059 0.86802546 0.85915033]
mean value: 0.8644822841417269
key: test_jcc
value: [0.58333333 0.72727273 0.72727273 0.81818182 0.80952381 0.58333333
0.80952381 0.70833333 0.62962963 0.65 ]
mean value: 0.7046404521404521
key: train_jcc
value: [0.76923077 0.74358974 0.78238342 0.76530612 0.74489796 0.76923077
0.75384615 0.78238342 0.765625 0.76 ]
mean value: 0.7636493356908327
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00920463 0.01011729 0.01018023 0.01018381 0.0103085 0.01037335
0.01010799 0.01031637 0.01011348 0.01027107]
mean value: 0.010117673873901367
key: score_time
value: [0.01739001 0.01198483 0.01202416 0.01202321 0.0119853 0.01478839
0.01201344 0.01181197 0.01210189 0.01525283]
mean value: 0.013137602806091308
key: test_mcc
value: [0.52704628 0.57894737 0.31622777 0.52704628 0.43643578 0.36842105
0.37047929 0.21821789 0.40780312 0.75614764]
mean value: 0.450677246185986
key: train_mcc
value: [0.63533809 0.67063465 0.67657595 0.63547005 0.71207276 0.68853317
0.64710361 0.64723801 0.72491598 0.58359133]
mean value: 0.6621473590091169
key: test_accuracy
value: [0.76315789 0.78947368 0.65789474 0.76315789 0.71052632 0.68421053
0.68421053 0.60526316 0.7027027 0.86486486]
mean value: 0.7225462304409673
key: train_accuracy
value: [0.81764706 0.83529412 0.83823529 0.81764706 0.85588235 0.84411765
0.82352941 0.82352941 0.86217009 0.79178886]
mean value: 0.8309841297222701
key: test_fscore
value: [0.75675676 0.78947368 0.64864865 0.75675676 0.74418605 0.68421053
0.66666667 0.54545455 0.73170732 0.83870968]
mean value: 0.7162570625813843
key: train_fscore
value: [0.81871345 0.83625731 0.83965015 0.81547619 0.85373134 0.84637681
0.8245614 0.8255814 0.85885886 0.79178886]
mean value: 0.8310995765381942
key: test_precision
value: [0.77777778 0.78947368 0.66666667 0.77777778 0.66666667 0.68421053
0.70588235 0.64285714 0.68181818 1. ]
mean value: 0.7393130777031706
key: train_precision
value: [0.81395349 0.83139535 0.83236994 0.8253012 0.86666667 0.83428571
0.81976744 0.81609195 0.87730061 0.79411765]
mean value: 0.8311250021616702
key: test_recall
value: [0.73684211 0.78947368 0.63157895 0.73684211 0.84210526 0.68421053
0.63157895 0.47368421 0.78947368 0.72222222]
mean value: 0.7038011695906432
key: train_recall
value: [0.82352941 0.84117647 0.84705882 0.80588235 0.84117647 0.85882353
0.82941176 0.83529412 0.84117647 0.78947368]
mean value: 0.8313003095975232
key: test_roc_auc
value: [0.76315789 0.78947368 0.65789474 0.76315789 0.71052632 0.68421053
0.68421053 0.60526316 0.7002924 0.86111111]
mean value: 0.7219298245614035
key: train_roc_auc
value: [0.81764706 0.83529412 0.83823529 0.81764706 0.85588235 0.84411765
0.82352941 0.82352941 0.8621087 0.79179567]
mean value: 0.8309786721706227
key: test_jcc
value: [0.60869565 0.65217391 0.48 0.60869565 0.59259259 0.52
0.5 0.375 0.57692308 0.72222222]
mean value: 0.5636303109129196
key: train_jcc
value: [0.69306931 0.71859296 0.72361809 0.68844221 0.74479167 0.73366834
0.70149254 0.7029703 0.75263158 0.65533981]
mean value: 0.7114616800753307
MCC on Blind test: 0.5
Accuracy on Blind test: 0.75
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01786804 0.01848149 0.01594734 0.01577377 0.0179441 0.01872945
0.01715374 0.01642799 0.01686239 0.01881289]
mean value: 0.017400121688842772
key: score_time
value: [0.01160026 0.01157236 0.01054382 0.01053405 0.01060677 0.01129889
0.01175284 0.01054502 0.01147771 0.01128459]
mean value: 0.011121630668640137
key: test_mcc
value: [0.84327404 0.9486833 0.78947368 0.78947368 0.78947368 0.48454371
0.89473684 0.73786479 0.51793973 0.78764146]
mean value: 0.7583104925854314
key: train_mcc
value: [0.80005537 0.78823529 0.79413139 0.81766121 0.80005537 0.8058963
0.78236648 0.80005537 0.82404541 0.79483211]
mean value: 0.8007334285346163
key: test_accuracy
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.89473684 0.73684211
0.94736842 0.86842105 0.75675676 0.89189189]
mean value: 0.878022759601707
key: train_accuracy
value: [0.9 0.89411765 0.89705882 0.90882353 0.9 0.90294118
0.89117647 0.9 0.91202346 0.8973607 ]
mean value: 0.9003501811281698
key: test_fscore
value: [0.91891892 0.97435897 0.89473684 0.89473684 0.89473684 0.70588235
0.94736842 0.87179487 0.7804878 0.88235294]
mean value: 0.8765374811436882
key: train_fscore
value: [0.9005848 0.89411765 0.8973607 0.90909091 0.9005848 0.90322581
0.89085546 0.89940828 0.91176471 0.89855072]
mean value: 0.9005543828827779
key: test_precision
value: [0.94444444 0.95 0.89473684 0.89473684 0.89473684 0.8
0.94736842 0.85 0.72727273 0.9375 ]
mean value: 0.8840796119085592
key: train_precision
value: [0.89534884 0.89411765 0.89473684 0.90643275 0.89534884 0.9005848
0.89349112 0.9047619 0.91176471 0.8908046 ]
mean value: 0.8987392040048102
key: test_recall
value: [0.89473684 1. 0.89473684 0.89473684 0.89473684 0.63157895
0.94736842 0.89473684 0.84210526 0.83333333]
mean value: 0.8728070175438596
key: train_recall
value: [0.90588235 0.89411765 0.9 0.91176471 0.90588235 0.90588235
0.88823529 0.89411765 0.91176471 0.90643275]
mean value: 0.9024079807361541
key: test_roc_auc
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.89473684 0.73684211
0.94736842 0.86842105 0.75438596 0.89035088]
mean value: 0.8776315789473684
key: train_roc_auc
value: [0.9 0.89411765 0.89705882 0.90882353 0.9 0.90294118
0.89117647 0.9 0.9120227 0.89733402]
mean value: 0.9003474372205023
key: test_jcc
value: [0.85 0.95 0.80952381 0.80952381 0.80952381 0.54545455
0.9 0.77272727 0.64 0.78947368]
mean value: 0.7876226930963773
key: train_jcc
value: [0.81914894 0.80851064 0.81382979 0.83333333 0.81914894 0.82352941
0.80319149 0.8172043 0.83783784 0.81578947]
mean value: 0.8191524144929399
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.49616385 1.30514312 1.44180202 1.37484336 1.37929106 1.37126446
1.30904698 1.39595342 1.31129622 1.29941964]
mean value: 1.3684224128723144
key: score_time
value: [0.01476836 0.01247644 0.01286674 0.02777433 0.01502228 0.01509333
0.01474261 0.01487732 0.0184629 0.01495028]
mean value: 0.016103458404541016
key: test_mcc
value: [0.89473684 0.89973541 0.68421053 0.89973541 0.85280287 0.58218174
0.89473684 0.79388419 0.51319869 0.94721815]
mean value: 0.7962440656984944
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 0.84210526 0.94736842 0.92105263 0.78947368
0.94736842 0.89473684 0.75675676 0.97297297]
mean value: 0.8966571834992887
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.95 0.84210526 0.94444444 0.92682927 0.77777778
0.94736842 0.9 0.76923077 0.97142857]
mean value: 0.8976552936437403
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.9047619 0.84210526 1. 0.86363636 0.82352941
0.94736842 0.85714286 0.75 1. ]
mean value: 0.8935912642568989
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 1. 0.84210526 0.89473684 1. 0.73684211
0.94736842 0.94736842 0.78947368 0.94444444]
mean value: 0.9049707602339181
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.94736842 0.84210526 0.94736842 0.92105263 0.78947368
0.94736842 0.89473684 0.75584795 0.97222222]
mean value: 0.8964912280701754
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.9047619 0.72727273 0.89473684 0.86363636 0.63636364
0.9 0.81818182 0.625 0.94444444]
mean value: 0.8214397736766158
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02944374 0.0182209 0.01535916 0.01744986 0.01732898 0.01718378
0.01695347 0.01524591 0.01629186 0.01514053]
mean value: 0.017861819267272948
key: score_time
value: [0.01158214 0.00925207 0.00901937 0.0089376 0.00885868 0.00885081
0.00914764 0.00883889 0.00884986 0.00873947]
mean value: 0.009207653999328613
key: test_mcc
value: [0.9486833 0.9486833 0.84327404 0.9486833 0.9486833 0.89973541
0.85280287 0.84327404 0.73099415 0.94736842]
mean value: 0.8912182126989485
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.97368421 0.92105263 0.97368421 0.97368421 0.94736842
0.92105263 0.92105263 0.86486486 0.97297297]
mean value: 0.9443100995732575
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.97297297 0.92307692 0.97435897 0.97435897 0.94444444
0.92682927 0.91891892 0.86486486 0.97297297]
mean value: 0.9447157288620703
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 1. 0.9 0.95 0.95 1.
0.86363636 0.94444444 0.88888889 0.94736842]
mean value: 0.9394338118022328
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94736842 0.94736842 1. 1. 0.89473684
1. 0.89473684 0.84210526 1. ]
mean value: 0.9526315789473684
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.97368421 0.92105263 0.97368421 0.97368421 0.94736842
0.92105263 0.92105263 0.86549708 0.97368421]
mean value: 0.9444444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.94736842 0.85714286 0.95 0.95 0.89473684
0.86363636 0.85 0.76190476 0.94736842]
mean value: 0.8972157666894509
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11228299 0.11245561 0.10948372 0.11234665 0.10924101 0.1089797
0.11292195 0.11369872 0.11021519 0.10727549]
mean value: 0.1108901023864746
key: score_time
value: [0.01883864 0.01856804 0.01872277 0.01774263 0.01919746 0.01752329
0.01925826 0.01753545 0.01749849 0.0175736 ]
mean value: 0.018245863914489745
key: test_mcc
value: [0.9486833 0.89473684 0.63960215 0.84327404 0.78947368 0.73786479
0.89473684 0.79388419 0.62170355 1. ]
mean value: 0.8163959384712077
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.81578947 0.92105263 0.89473684 0.86842105
0.94736842 0.89473684 0.81081081 1. ]
mean value: 0.9073968705547653
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97435897 0.94736842 0.8 0.92307692 0.89473684 0.87179487
0.94736842 0.9 0.82051282 1. ]
mean value: 0.9079217273954115
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.94736842 0.875 0.9 0.89473684 0.85
0.94736842 0.85714286 0.8 1. ]
mean value: 0.9021616541353383
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94736842 0.73684211 0.94736842 0.89473684 0.89473684
0.94736842 0.94736842 0.84210526 1. ]
mean value: 0.9157894736842105
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.81578947 0.92105263 0.89473684 0.86842105
0.94736842 0.89473684 0.80994152 1. ]
mean value: 0.9073099415204678
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95 0.9 0.66666667 0.85714286 0.80952381 0.77272727
0.9 0.81818182 0.69565217 1. ]
mean value: 0.8369894598155467
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01023006 0.0107708 0.00978875 0.01090908 0.00982141 0.00973773
0.01041436 0.009727 0.01096225 0.0110054 ]
mean value: 0.010336685180664062
key: score_time
value: [0.00909853 0.00951576 0.00937533 0.00973797 0.00959563 0.0088625
0.00879812 0.0088625 0.00882602 0.00961709]
mean value: 0.009228944778442383
key: test_mcc
value: [0.57894737 0.63245553 0.21821789 0.47368421 0.58218174 0.68803296
0.47633051 0.42640143 0.19005848 0.62170355]
mean value: 0.48880136757948445
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.81578947 0.60526316 0.73684211 0.78947368 0.84210526
0.73684211 0.71052632 0.59459459 0.81081081]
mean value: 0.743172119487909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.78947368 0.81081081 0.54545455 0.73684211 0.8 0.85
0.72222222 0.73170732 0.59459459 0.8 ]
mean value: 0.7381105279629028
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.78947368 0.83333333 0.64285714 0.73684211 0.76190476 0.80952381
0.76470588 0.68181818 0.61111111 0.82352941]
mean value: 0.7455099424139672
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.78947368 0.78947368 0.47368421 0.73684211 0.84210526 0.89473684
0.68421053 0.78947368 0.57894737 0.77777778]
mean value: 0.735672514619883
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78947368 0.81578947 0.60526316 0.73684211 0.78947368 0.84210526
0.73684211 0.71052632 0.59502924 0.80994152]
mean value: 0.7431286549707602
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.65217391 0.68181818 0.375 0.58333333 0.66666667 0.73913043
0.56521739 0.57692308 0.42307692 0.66666667]
mean value: 0.5930006587615283
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.52
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.55051184 1.54603195 1.53799772 1.54854393 1.5016973 1.51683092
1.5347023 1.53684664 1.5649302 1.55764413]
mean value: 1.5395736932754516
key: score_time
value: [0.09356093 0.09390473 0.09295797 0.09251022 0.09202743 0.09703946
0.09734011 0.09730768 0.09886241 0.09727573]
mean value: 0.09527866840362549
key: test_mcc
value: [1. 1. 0.78947368 0.89473684 0.89973541 0.89973541
1. 0.9486833 0.78362573 0.94736842]
mean value: 0.9163358798097961
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.89473684 0.94736842 0.94736842 0.94736842
1. 0.97368421 0.89189189 0.97297297]
mean value: 0.9575391180654338
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.89473684 0.94736842 0.95 0.94444444
1. 0.97435897 0.89473684 0.97297297]
mean value: 0.9578618497039549
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.89473684 0.94736842 0.9047619 1.
1. 0.95 0.89473684 0.94736842]
mean value: 0.9538972431077695
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.89473684 0.94736842 1. 0.89473684
1. 1. 0.89473684 1. ]
mean value: 0.9631578947368421
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.89473684 0.94736842 0.94736842 0.94736842
1. 0.97368421 0.89181287 0.97368421]
mean value: 0.9576023391812866
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.80952381 0.9 0.9047619 0.89473684
1. 0.95 0.80952381 0.94736842]
mean value: 0.9215914786967419
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.91693258 0.89162326 0.93540263 0.97587872 0.90822935 0.87675786
0.95299554 0.92298007 1.00643253 0.8825717 ]
mean value: 0.9269804239273072
key: score_time
value: [0.22907305 0.24806905 0.17145753 0.21838999 0.27262688 0.16964602
0.24580407 0.27468348 0.24008393 0.2604115 ]
mean value: 0.23302454948425294
key: test_mcc
value: [1. 0.89973541 0.68803296 0.89473684 0.84327404 0.85280287
0.9486833 0.89973541 0.67849265 0.94736842]
mean value: 0.8652861897439335
key: train_mcc
value: [0.95884012 0.95884012 0.95884012 0.95294118 0.95300713 0.95884012
0.95884012 0.96477265 0.97653939 0.95896113]
mean value: 0.9600422066629911
key: test_accuracy
value: [1. 0.94736842 0.84210526 0.94736842 0.92105263 0.92105263
0.97368421 0.94736842 0.83783784 0.97297297]
mean value: 0.931081081081081
key: train_accuracy
value: [0.97941176 0.97941176 0.97941176 0.97647059 0.97647059 0.97941176
0.97941176 0.98235294 0.98826979 0.97947214]
mean value: 0.9800094876660341
key: test_fscore
value: [1. 0.94444444 0.83333333 0.94736842 0.92307692 0.91428571
0.97297297 0.95 0.85 0.97297297]
mean value: 0.9308454782138993
key: train_fscore
value: [0.97935103 0.97935103 0.97935103 0.97647059 0.97633136 0.97947214
0.97935103 0.98224852 0.98823529 0.97947214]
mean value: 0.9799634175328182
key: test_precision
value: [1. 1. 0.88235294 0.94736842 0.9 1.
1. 0.9047619 0.80952381 0.94736842]
mean value: 0.9391375497567448/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: train_precision
value: [0.98224852 0.98224852 0.98224852 0.97647059 0.98214286 0.97660819
0.98224852 0.98809524 0.98823529 0.98235294]
mean value: 0.9822899188742247
key: test_recall
value: [1. 0.89473684 0.78947368 0.94736842 0.94736842 0.84210526
0.94736842 1. 0.89473684 1. ]
mean value: 0.9263157894736842
key: train_recall
value: [0.97647059 0.97647059 0.97647059 0.97647059 0.97058824 0.98235294
0.97647059 0.97647059 0.98823529 0.97660819]
mean value: 0.9776608187134502
key: test_roc_auc
value: [1. 0.94736842 0.84210526 0.94736842 0.92105263 0.92105263
0.97368421 0.94736842 0.83625731 0.97368421]
mean value: 0.9309941520467836
key: train_roc_auc
value: [0.97941176 0.97941176 0.97941176 0.97647059 0.97647059 0.97941176
0.97941176 0.98235294 0.98826969 0.97948056]
mean value: 0.9800103199174407
key: test_jcc
value: [1. 0.89473684 0.71428571 0.9 0.85714286 0.84210526
0.94736842 0.9047619 0.73913043 0.94736842]
mean value: 0.8746899858341506
key: train_jcc
value: [0.95953757 0.95953757 0.95953757 0.95402299 0.95375723 0.95977011
0.95953757 0.96511628 0.97674419 0.95977011]
mean value: 0.960733119795795
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.010741 0.00969005 0.01051641 0.00985312 0.00964284 0.010144
0.00974083 0.0098412 0.00984383 0.01043248]
mean value: 0.010044574737548828
key: score_time
value: [0.0090158 0.00899315 0.00946307 0.00896311 0.0089345 0.01059484
0.00890207 0.00879717 0.00891232 0.00898099]
mean value: 0.009155702590942384
key: test_mcc
value: [0.47368421 0.68421053 0.68421053 0.79388419 0.78947368 0.47368421
0.78947368 0.63960215 0.48078072 0.62807634]
mean value: 0.6437080232004303
key: train_mcc
value: [0.73561236 0.70588235 0.75314969 0.72986649 0.70593121 0.73561236
0.71769673 0.75314969 0.73607623 0.71966354]
mean value: 0.7292640648226028
key: test_accuracy
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
0.89473684 0.81578947 0.72972973 0.81081081]
mean value: 0.8198435277382645
key: train_accuracy
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
0.85882353 0.87647059 0.86803519 0.85923754]
mean value: 0.8644919786096257
key: test_fscore
value: [0.73684211 0.84210526 0.84210526 0.9 0.89473684 0.73684211
0.89473684 0.82926829 0.77272727 0.78787879]
mean value: 0.8237242774341619
key: train_fscore
value: [0.86956522 0.85294118 0.87790698 0.86705202 0.85380117 0.86956522
0.85964912 0.87790698 0.86725664 0.86363636]
mean value: 0.8659280881065122
key: test_precision
value: [0.73684211 0.84210526 0.84210526 0.85714286 0.89473684 0.73684211
0.89473684 0.77272727 0.68 0.86666667]
mean value: 0.8123905217589428
key: train_precision
value: [0.85714286 0.85294118 0.86781609 0.85227273 0.84883721 0.85714286
0.85465116 0.86781609 0.86982249 0.83977901]
mean value: 0.8568221664762061
key: test_recall
value: [0.73684211 0.84210526 0.84210526 0.94736842 0.89473684 0.73684211
0.89473684 0.89473684 0.89473684 0.72222222]
mean value: 0.8406432748538012
key: train_recall
value: [0.88235294 0.85294118 0.88823529 0.88235294 0.85882353 0.88235294
0.86470588 0.88823529 0.86470588 0.88888889]
mean value: 0.8753594771241829
key: test_roc_auc
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
0.89473684 0.81578947 0.7251462 0.80847953]
mean value: 0.8191520467836257
key: train_roc_auc
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
0.85882353 0.87647059 0.86802546 0.85915033]
mean value: 0.8644822841417269
key: test_jcc
value: [0.58333333 0.72727273 0.72727273 0.81818182 0.80952381 0.58333333
0.80952381 0.70833333 0.62962963 0.65 ]
mean value: 0.7046404521404521
key: train_jcc
value: [0.76923077 0.74358974 0.78238342 0.76530612 0.74489796 0.76923077
0.75384615 0.78238342 0.765625 0.76 ]
mean value: 0.7636493356908327
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10662818 0.08921003 0.0561254 0.17810702 0.05234981 0.05486059
0.06721592 0.05890155 0.26667285 0.05537868]
mean value: 0.0985450029373169
key: score_time
value: [0.01744914 0.01164341 0.01065397 0.01203752 0.01097751 0.01060772
0.01059103 0.01130843 0.01121664 0.01066351]
mean value: 0.011714887619018555
key: test_mcc
value: [1. 1. 1. 0.9486833 0.9486833 0.89973541
0.84327404 0.89473684 0.78362573 0.94736842]
mean value: 0.9266107043807079
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 0.97368421 0.97368421 0.94736842
0.92105263 0.94736842 0.89189189 0.97297297]
mean value: 0.9628022759601707
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 0.97435897 0.97435897 0.94444444
0.92307692 0.94736842 0.89473684 0.97297297]
mean value: 0.9631317552370184
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0.95 0.95 1.
0.9 0.94736842 0.89473684 0.94736842]
mean value: 0.9589473684210527
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.89473684
0.94736842 0.94736842 0.89473684 1. ]
mean value: 0.968421052631579
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 0.97368421 0.97368421 0.94736842
0.92105263 0.94736842 0.89181287 0.97368421]
mean value: 0.9628654970760234
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 0.95 0.95 0.89473684
0.85714286 0.9 0.80952381 0.94736842]
mean value: 0.9308771929824561
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04259491 0.05594087 0.03498626 0.03416085 0.03477049 0.06916142
0.06280589 0.03609562 0.05616331 0.07135487]
mean value: 0.04980344772338867
key: score_time
value: [0.02427268 0.01239538 0.01218939 0.01224589 0.01225781 0.02066946
0.01230192 0.01249409 0.02214265 0.01604795]
mean value: 0.015701723098754884
key: test_mcc
value: [0.74620251 1. 0.73786479 0.84327404 0.78947368 0.38829014
0.68421053 0.80757285 0.51461988 0.83918129]
mean value: 0.7350689707890694
key: train_mcc
value: [0.92354539 0.92966915 0.95884012 0.92947609 0.95294118 0.95300713
0.94124161 0.9353103 0.95314596 0.94762566]
mean value: 0.9424802586910812
key: test_accuracy
value: [0.86842105 1. 0.86842105 0.92105263 0.89473684 0.68421053
0.84210526 0.89473684 0.75675676 0.91891892]
mean value: 0.8649359886201992
key: train_accuracy
value: [0.96176471 0.96470588 0.97941176 0.96470588 0.97647059 0.97647059
0.97058824 0.96764706 0.97653959 0.97360704]
mean value: 0.9711911333448335
key: test_fscore
value: [0.87804878 1. 0.86486486 0.92307692 0.89473684 0.625
0.84210526 0.9047619 0.75675676 0.91891892]
mean value: 0.8608270254130331
key: train_fscore
value: [0.96165192 0.96428571 0.97935103 0.96449704 0.97647059 0.97660819
0.9704142 0.96755162 0.97660819 0.97329377]
mean value: 0.9710732260210945
key: test_precision
value: [0.81818182 1. 0.88888889 0.9 0.89473684 0.76923077
0.84210526 0.82608696 0.77777778 0.89473684]
mean value: 0.8611745157969415
key: train_precision
value: [0.96449704 0.97590361 0.98224852 0.9702381 0.97647059 0.97093023
0.97619048 0.9704142 0.97093023 0.98795181]
mean value: 0.9745774809780501
key: test_recall
value: [0.94736842 1. 0.84210526 0.94736842 0.89473684 0.52631579
0.84210526 1. 0.73684211 0.94444444]
mean value: 0.8681286549707602
key: train_recall
value: [0.95882353 0.95294118 0.97647059 0.95882353 0.97647059 0.98235294
0.96470588 0.96470588 0.98235294 0.95906433]
mean value: 0.9676711386308909
key: test_roc_auc
value: [0.86842105 1. 0.86842105 0.92105263 0.89473684 0.68421053
0.84210526 0.89473684 0.75730994 0.91959064]
mean value: 0.8650584795321637
key: train_roc_auc
value: [0.96176471 0.96470588 0.97941176 0.96470588 0.97647059 0.97647059
0.97058824 0.96764706 0.97655659 0.97364981]
mean value: 0.9711971104231166
key: test_jcc
value: [0.7826087 1. 0.76190476 0.85714286 0.80952381 0.45454545
0.72727273 0.82608696 0.60869565 0.85 ]
mean value: 0.7677780914737437
key: train_jcc
value: [0.92613636 0.93103448 0.95953757 0.93142857 0.95402299 0.95428571
0.94252874 0.93714286 0.95428571 0.94797688]
mean value: 0.9438379878542824
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.0131712 0.01293206 0.00980425 0.01048255 0.01042271 0.01054835
0.01047897 0.01062369 0.0106225 0.01053381]
mean value: 0.01096200942993164
key: score_time
value: [0.01183963 0.00915504 0.00976515 0.00946379 0.00945783 0.00950193
0.00948477 0.00947523 0.00913143 0.00954247]
mean value: 0.00968172550201416
key: test_mcc
value: [0.59222009 0.68803296 0.63245553 0.63960215 0.68421053 0.58218174
0.63960215 0.73786479 0.52960948 0.73821295]
mean value: 0.6463992364353213
key: train_mcc
value: [0.65923425 0.60639664 0.70632241 0.65322377 0.67175144 0.7236421
0.60766169 0.74707175 0.71317436 0.69522435]
mean value: 0.6783702762219075
key: test_accuracy
value: [0.78947368 0.84210526 0.81578947 0.81578947 0.84210526 0.78947368
0.81578947 0.86842105 0.75675676 0.86486486]
mean value: 0.8200568990042674
key: train_accuracy
value: [0.82941176 0.80294118 0.85294118 0.82647059 0.83529412 0.86176471
0.80294118 0.87352941 0.85630499 0.84750733]
mean value: 0.8389106434362601
key: test_fscore
value: [0.76470588 0.83333333 0.81081081 0.8 0.84210526 0.77777778
0.8 0.87179487 0.79069767 0.84848485]
mean value: 0.8139710462131082
key: train_fscore
value: [0.82634731 0.7987988 0.8502994 0.8238806 0.83030303 0.86053412
0.79510703 0.87315634 0.85285285 0.84615385]
mean value: 0.8357433332161395
key: test_precision
value: [0.86666667 0.88235294 0.83333333 0.875 0.84210526 0.82352941
0.875 0.85 0.70833333 0.93333333]
mean value: 0.8489654282765737
key: train_precision
value: [0.84146341 0.81595092 0.86585366 0.83636364 0.85625 0.86826347
0.82802548 0.87573964 0.87116564 0.85628743]
mean value: 0.8515363294832559
key: test_recall
value: [0.68421053 0.78947368 0.78947368 0.73684211 0.84210526 0.73684211
0.73684211 0.89473684 0.89473684 0.77777778]
mean value: 0.7883040935672514
key: train_recall
value: [0.81176471 0.78235294 0.83529412 0.81176471 0.80588235 0.85294118
0.76470588 0.87058824 0.83529412 0.83625731]
mean value: 0.8206845545235638
key: test_roc_auc
value: [0.78947368 0.84210526 0.81578947 0.81578947 0.84210526 0.78947368
0.81578947 0.86842105 0.75292398 0.8625731 ]
mean value: 0.8194444444444444
key: train_roc_auc
value: [0.82941176 0.80294118 0.85294118 0.82647059 0.83529412 0.86176471
0.80294118 0.87352941 0.85624355 0.84754042]
mean value: 0.8389078087375301
key: test_jcc
value: [0.61904762 0.71428571 0.68181818 0.66666667 0.72727273 0.63636364
0.66666667 0.77272727 0.65384615 0.73684211]
mean value: 0.6875536743957796
key: train_jcc
value: [0.70408163 0.665 0.73958333 0.70050761 0.70984456 0.75520833
0.65989848 0.77486911 0.7434555 0.73333333]
mean value: 0.7185781890938955
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01748848 0.01892996 0.01658511 0.01925921 0.01871228 0.01994371
0.01729989 0.01780486 0.0200069 0.01862764]
mean value: 0.018465805053710937
key: score_time
value: [0.0090909 0.01126218 0.01121306 0.01176643 0.01179862 0.01175475
0.01167011 0.01188803 0.01181984 0.0126853 ]
mean value: 0.011494922637939452
key: test_mcc
value: [0.76376262 0.89973541 0.38729833 0.78947368 0.84327404 0.52704628
0.29277002 0.79388419 0.62807634 0.75614764]
mean value: 0.6681468554833931
key: train_mcc
value: [0.8452381 0.79448906 0.5864073 0.88333157 0.89010061 0.90189002
0.40544243 0.84174979 0.88932517 0.89043758]
mean value: 0.7928411614629753
key: test_accuracy
value: [0.86842105 0.94736842 0.65789474 0.89473684 0.92105263 0.76315789
0.57894737 0.89473684 0.81081081 0.86486486]
mean value: 0.820199146514936
key: train_accuracy
value: [0.91764706 0.88823529 0.75588235 0.94117647 0.94411765 0.95
0.64117647 0.91470588 0.94428152 0.94428152]
mean value: 0.8841504226323961
key: test_fscore
value: [0.84848485 0.95 0.73469388 0.89473684 0.91891892 0.76923077
0.7037037 0.9 0.82926829 0.83870968]
mean value: 0.8387746930096805
key: train_fscore
value: [0.91082803 0.89893617 0.80378251 0.94252874 0.94224924 0.95156695
0.73593074 0.90675241 0.94524496 0.94259819]
mean value: 0.8980417920511166
key: test_precision
value: [1. 0.9047619 0.6 0.89473684 0.94444444 0.75
0.54285714 0.85714286 0.77272727 1. ]
mean value: 0.8266670464038886
key: train_precision
value: [0.99305556 0.82038835 0.67193676 0.92134831 0.97484277 0.92265193
0.58219178 1. 0.92655367 0.975 ]
mean value: 0.8787969132705697
key: test_recall
value: [0.73684211 1. 0.94736842 0.89473684 0.89473684 0.78947368
1. 0.94736842 0.89473684 0.72222222]
mean value: 0.882748538011696
key: train_recall
value: [0.84117647 0.99411765 1. 0.96470588 0.91176471 0.98235294
1. 0.82941176 0.96470588 0.9122807 ]
mean value: 0.9400515995872033
key: test_roc_auc
value: [0.86842105 0.94736842 0.65789474 0.89473684 0.92105263 0.76315789
0.57894737 0.89473684 0.80847953 0.86111111]
mean value: 0.8195906432748539
key: train_roc_auc
value: [0.91764706 0.88823529 0.75588235 0.94117647 0.94411765 0.95
0.64117647 0.91470588 0.94434125 0.94437564]
mean value: 0.8841658066735466
key: test_jcc
value: [0.73684211 0.9047619 0.58064516 0.80952381 0.85 0.625
0.54285714 0.81818182 0.70833333 0.72222222]
mean value: 0.7298367497433711
key: train_jcc
value: [0.83625731 0.81642512 0.67193676 0.89130435 0.8908046 0.9076087
0.58219178 0.82941176 0.89617486 0.89142857]
mean value: 0.8213543811131508
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01583433 0.01828694 0.01696563 0.02924252 0.03335357 0.01711559
0.01663041 0.01640534 0.01945257 0.01540542]
mean value: 0.019869232177734376
key: score_time
value: [0.01194024 0.01182508 0.02836466 0.01295829 0.02824974 0.0117662
0.01195216 0.01181483 0.01190305 0.01181436]
mean value: 0.015258860588073731
key: test_mcc
value: [0.74620251 0.85280287 0.57894737 0.84327404 0.84327404 0.69989647
0.76376262 0.63828474 0.63129316 0.69356297]
mean value: 0.7291300780936345
key: train_mcc
value: [0.78047467 0.86610667 0.91771057 0.87209836 0.90594505 0.81649658
0.67766324 0.74545617 0.88351945 0.72157164]
mean value: 0.8187042409466225
key: test_accuracy
value: [0.86842105 0.92105263 0.78947368 0.92105263 0.92105263 0.84210526
0.86842105 0.78947368 0.81081081 0.83783784]
mean value: 0.8569701280227596
key: train_accuracy
value: [0.88235294 0.92941176 0.95882353 0.93529412 0.95294118 0.9
0.81470588 0.85882353 0.93841642 0.84750733]
mean value: 0.901827669484216
key: test_fscore
value: [0.87804878 0.91428571 0.78947368 0.92307692 0.92307692 0.85714286
0.88372093 0.82608696 0.8 0.85 ]
mean value: 0.8644912769035046
key: train_fscore
value: [0.89304813 0.9245283 0.95857988 0.93714286 0.95266272 0.90909091
0.84367246 0.87564767 0.93416928 0.86597938]
mean value: 0.909452158542273
key: test_precision
value: [0.81818182 1. 0.78947368 0.9 0.9 0.7826087
0.79166667 0.7037037 0.875 0.77272727]
mean value: 0.8333361841142162
key: train_precision
value: [0.81862745 0.99324324 0.96428571 0.91111111 0.95833333 0.83333333
0.72961373 0.78240741 1. 0.77419355]
mean value: 0.8765148875987211
key: test_recall
value: [0.94736842 0.84210526 0.78947368 0.94736842 0.94736842 0.94736842
1. 1. 0.73684211 0.94444444]
mean value: 0.910233918128655
key: train_recall
value: [0.98235294 0.86470588 0.95294118 0.96470588 0.94705882 1.
1. 0.99411765 0.87647059 0.98245614]
mean value: 0.9564809081527348
key: test_roc_auc
value: [0.86842105 0.92105263 0.78947368 0.92105263 0.92105263 0.84210526
0.86842105 0.78947368 0.8128655 0.84064327]
mean value: 0.8574561403508771
key: train_roc_auc
value: [0.88235294 0.92941176 0.95882353 0.93529412 0.95294118 0.9
0.81470588 0.85882353 0.93823529 0.84711042]
mean value: 0.9017698658410733
key: test_jcc
value: [0.7826087 0.84210526 0.65217391 0.85714286 0.85714286 0.75
0.79166667 0.7037037 0.66666667 0.73913043]
mean value: 0.7642341057958907
key: train_jcc
value: [0.80676329 0.85964912 0.92045455 0.88172043 0.90960452 0.83333333
0.72961373 0.77880184 0.87647059 0.76363636]
mean value: 0.8360047765595798
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.17080402 0.1569531 0.15340972 0.15698004 0.15308666 0.15458584
0.14575839 0.14861679 0.15207219 0.14971685]
mean value: 0.15419836044311525
key: score_time
value: [0.01653957 0.0166285 0.01610899 0.01642632 0.0170877 0.0152936
0.01553082 0.01582837 0.01660395 0.01644444]
mean value: 0.01624922752380371
key: test_mcc
value: [1. 1. 1. 0.9486833 1. 0.89973541
0.84327404 0.89473684 0.78362573 0.94736842]
mean value: 0.9317423745756566
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 0.97368421 1. 0.94736842
0.92105263 0.94736842 0.89189189 0.97297297]
mean value: 0.9654338549075391
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 0.97435897 1. 0.94444444
0.92307692 0.94736842 0.89473684 0.97297297]
mean value: 0.965695857801121
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0.95 1. 1.
0.9 0.94736842 0.89473684 0.94736842]
mean value: 0.9639473684210527
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.89473684
0.94736842 0.94736842 0.89473684 1. ]
mean value: 0.968421052631579
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 0.97368421 1. 0.94736842
0.92105263 0.94736842 0.89181287 0.97368421]
mean value: 0.9654970760233919
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 0.95 1. 0.89473684
0.85714286 0.9 0.80952381 0.94736842]
mean value: 0.9358771929824561
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05442548 0.06178308 0.06514335 0.05190825 0.05278492 0.05515146
0.04277182 0.05043507 0.06439567 0.04802108]
mean value: 0.054682016372680664
key: score_time
value: [0.03232479 0.03214812 0.02032709 0.02793241 0.03441763 0.01812482
0.02434039 0.01886606 0.02517366 0.03107882]
mean value: 0.026473379135131835
key: test_mcc
value: [1. 1. 1. 0.9486833 0.9486833 0.89973541
0.84327404 0.9486833 0.73099415 0.94736842]
mean value: 0.9267421920804961
key: train_mcc
value: [1. 0.99413485 1. 1. 0.98830369 0.99413485
0.98823529 0.99413485 1. 0.98833809]
mean value: 0.9947281618348918
key: test_accuracy
value: [1. 1. 1. 0.97368421 0.97368421 0.94736842
0.92105263 0.97368421 0.86486486 0.97297297]
mean value: 0.9627311522048364
key: train_accuracy
value: [1. 0.99705882 1. 1. 0.99411765 0.99705882
0.99411765 0.99705882 1. 0.9941349 ]
mean value: 0.9973546662066586
key: test_fscore
value: [1. 1. 1. 0.97435897 0.97435897 0.94444444
0.92307692 0.97435897 0.86486486 0.97297297]
mean value: 0.9628436128436129
key: train_fscore
value: [1. 0.99705015 1. 1. 0.99408284 0.99705015
0.99411765 0.99705015 1. 0.99411765]
mean value: 0.997346857683221
key: test_precision
value: [1. 1. 1. 0.95 0.95 1.
0.9 0.95 0.88888889 0.94736842]
mean value: 0.958625730994152
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.99411765 1. 1. 1. ]
mean value: 0.9994117647058823
key: test_recall
value: [1. 1. 1. 1. 1. 0.89473684
0.94736842 1. 0.84210526 1. ]
mean value: 0.968421052631579
key: train_recall
value: [1. 0.99411765 1. 1. 0.98823529 0.99411765
0.99411765 0.99411765 1. 0.98830409]
mean value: 0.9953009975920193
key: test_roc_auc
value: [1. 1. 1. 0.97368421 0.97368421 0.94736842
0.92105263 0.97368421 0.86549708 0.97368421]
mean value: 0.9628654970760234
key: train_roc_auc
value: [1. 0.99705882 1. 1. 0.99411765 0.99705882
0.99411765 0.99705882 1. 0.99415205]
mean value: 0.9973563811489509
key: test_jcc
value: [1. 1. 1. 0.95 0.95 0.89473684
0.85714286 0.95 0.76190476 0.94736842]
mean value: 0.9311152882205513
key: train_jcc
value: [1. 0.99411765 1. 1. 0.98823529 0.99411765
0.98830409 0.99411765 1. 0.98830409]
mean value: 0.994719642242862
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.08240032 0.13941526 0.11900902 0.09115481 0.0813446 0.11244035
0.08711672 0.18356252 0.14831948 0.12165046]
mean value: 0.11664135456085205
key: score_time
value: [0.02674294 0.03760934 0.0219245 0.01437283 0.02252603 0.02183795
0.02192736 0.04869914 0.02245545 0.02846289]
mean value: 0.02665584087371826
key: test_mcc
value: [0.58218174 0.68421053 0.37047929 0.68803296 0.65465367 0.63960215
0.68803296 0.68803296 0.24189738 0.73821295]
mean value: 0.5975336578127587
key: train_mcc
value: [0.99413485 0.99413485 0.99413485 1. 1. 0.99413485
0.99413485 0.99413485 0.99415185 0.99415205]
mean value: 0.9953112973615207
key: test_accuracy
value: [0.78947368 0.84210526 0.68421053 0.84210526 0.81578947 0.81578947
0.84210526 0.84210526 0.62162162 0.86486486]
mean value: 0.7960170697012802
key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 1. 1. 0.99705882
0.99705882 0.99705882 0.99706745 0.99706745]
mean value: 0.9976487838537175
key: test_fscore
value: [0.77777778 0.84210526 0.66666667 0.83333333 0.8372093 0.8
0.85 0.83333333 0.65 0.84848485]
mean value: 0.7938910525079436
key: train_fscore
value: [0.99705015 0.99705015 0.99705015 1. 1. 0.99705015
0.99705015 0.99705015 0.99705015 0.99706745]
mean value: 0.9976418481128729
key: test_precision
value: [0.82352941 0.84210526 0.70588235 0.88235294 0.75 0.875
0.80952381 0.88235294 0.61904762 0.93333333]
mean value: 0.812312767212148
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.73684211 0.84210526 0.63157895 0.78947368 0.94736842 0.73684211
0.89473684 0.78947368 0.68421053 0.77777778]
mean value: 0.7830409356725146
key: train_recall
value: [0.99411765 0.99411765 0.99411765 1. 1. 0.99411765
0.99411765 0.99411765 0.99411765 0.99415205]
mean value: 0.9952975576195391
key: test_roc_auc
value: [0.78947368 0.84210526 0.68421053 0.84210526 0.81578947 0.81578947
0.84210526 0.84210526 0.61988304 0.8625731 ]
mean value: 0.7956140350877193
key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 1. 1. 0.99705882
0.99705882 0.99705882 0.99705882 0.99707602]
mean value: 0.9976487788097695
key: test_jcc
value: [0.63636364 0.72727273 0.5 0.71428571 0.72 0.66666667
0.73913043 0.71428571 0.48148148 0.73684211]
mean value: 0.6636328480401706
key: train_jcc
value: [0.99411765 0.99411765 0.99411765 1. 1. 0.99411765
0.99411765 0.99411765 0.99411765 0.99415205]
mean value: 0.9952975576195391
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.56569266 0.5649147 0.54991865 0.56822872 0.55563831 0.56164074
0.56107974 0.54614925 0.56099558 0.56145477]
mean value: 0.5595713138580323
key: score_time
value: [0.0103898 0.00958323 0.00938058 0.00951958 0.00956798 0.01010776
0.00926852 0.00950861 0.01079369 0.01012635]
mean value: 0.009824609756469727
key: test_mcc
value: [1. 1. 0.9486833 0.9486833 0.9486833 0.89973541
0.89973541 0.9486833 0.78362573 0.94736842]
mean value: 0.9325198165933714
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.97368421 0.97368421 0.97368421 0.94736842
0.94736842 0.97368421 0.89189189 0.97297297]
mean value: 0.9654338549075391
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.97435897 0.97435897 0.97435897 0.94444444
0.95 0.97435897 0.89473684 0.97297297]
mean value: 0.9659590156958578
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.95 0.95 0.95 1.
0.9047619 0.95 0.89473684 0.94736842]
mean value: 0.9546867167919799
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.89473684
1. 1. 0.89473684 1. ]
mean value: 0.9789473684210527
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.97368421 0.97368421 0.97368421 0.94736842
0.94736842 0.97368421 0.89181287 0.97368421]
mean value: 0.9654970760233919
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.95 0.95 0.95 0.89473684
0.9047619 0.95 0.80952381 0.94736842]
mean value: 0.9356390977443609
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02689314 0.02751875 0.02677655 0.0268662 0.02773142 0.03597403
0.02919483 0.02741051 0.02737808 0.02891994]
mean value: 0.028466343879699707
key: score_time
value: [0.01260877 0.01430941 0.01301813 0.01564336 0.01542616 0.01571608
0.01542115 0.0155189 0.01564002 0.01539946]
mean value: 0.014870142936706543
key: test_mcc
value: [0.63245553 0.21821789 0.26315789 0.59222009 0.74620251 0.31622777
0.26462806 0.42640143 0.19504453 0.83871328]
mean value: 0.4493268986749522
key: train_mcc
value: [0.99413485 0.92077472 0.77216846 0.976741 0.98250594 0.91533482
0.9707394 0.93725826 0.89204798 0.98826969]
mean value: 0.9349975124716327
key: test_accuracy
value: [0.81578947 0.60526316 0.63157895 0.78947368 0.86842105 0.65789474
0.63157895 0.71052632 0.59459459 0.91891892]
mean value: 0.7224039829302987
key: train_accuracy
value: [0.99705882 0.95882353 0.87352941 0.98823529 0.99117647 0.95588235
0.98529412 0.96764706 0.94428152 0.9941349 ]
mean value: 0.9656063481110919
key: test_fscore
value: [0.82051282 0.65116279 0.63157895 0.76470588 0.87804878 0.64864865
0.61111111 0.68571429 0.66666667 0.91428571]
mean value: 0.7272435647846088
key: train_fscore
value: [0.99705015 0.96045198 0.85521886 0.98809524 0.99109792 0.95384615
0.9851632 0.96656535 0.94647887 0.99415205]
mean value: 0.9638119769217577
key: test_precision
value: [0.8 0.58333333 0.63157895 0.86666667 0.81818182 0.66666667
0.64705882 0.75 0.57692308 0.94117647]
mean value: 0.728158580325763
key: train_precision
value: [1. 0.92391304 1. 1. 1. 1.
0.99401198 1. 0.90810811 0.99415205]
mean value: 0.9820185174417899
key: test_recall
value: [0.84210526 0.73684211 0.63157895 0.68421053 0.94736842 0.63157895
0.57894737 0.63157895 0.78947368 0.88888889]
mean value: 0.7362573099415204
key: train_recall
value: [0.99411765 1. 0.74705882 0.97647059 0.98235294 0.91176471
0.97647059 0.93529412 0.98823529 0.99415205]
mean value: 0.9505916752665978
key: test_roc_auc
value: [0.81578947 0.60526316 0.63157895 0.78947368 0.86842105 0.65789474
0.63157895 0.71052632 0.58918129 0.91812865]
mean value: 0.7217836257309942
key: train_roc_auc
value: [0.99705882 0.95882353 0.87352941 0.98823529 0.99117647 0.95588235
0.98529412 0.96764706 0.94441004 0.99413485]
mean value: 0.9656191950464397
key: test_jcc
value: [0.69565217 0.48275862 0.46153846 0.61904762 0.7826087 0.48
0.44 0.52173913 0.5 0.84210526]
mean value: 0.5825449964433631
key: train_jcc
value: [0.99411765 0.92391304 0.74705882 0.97647059 0.98235294 0.91176471
0.97076023 0.93529412 0.89839572 0.98837209]
mean value: 0.9328499915874191
MCC on Blind test: 0.43
Accuracy on Blind test: 0.71
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0276432 0.05927157 0.03859544 0.03848791 0.03850269 0.05718088
0.03406334 0.01551056 0.01546884 0.01610494]
mean value: 0.03408293724060059
key: score_time
value: [0.02301216 0.02319098 0.02459955 0.0241189 0.0216651 0.02239084
0.0124681 0.01247931 0.0128572 0.01285744]
mean value: 0.018963956832885744
key: test_mcc
value: [0.84327404 0.9486833 0.73786479 0.84327404 0.84327404 0.48454371
0.89473684 0.79388419 0.56725146 0.78764146]
mean value: 0.7744427877554028
key: train_mcc
value: [0.87064849 0.88241401 0.89417953 0.86484056 0.88825066 0.89417953
0.87058824 0.88235294 0.89442724 0.88269694]
mean value: 0.8824578138125448
key: test_accuracy
value: [0.92105263 0.97368421 0.86842105 0.92105263 0.92105263 0.73684211
0.94736842 0.89473684 0.78378378 0.89189189]
mean value: 0.8859886201991465
key: train_accuracy
value: [0.93529412 0.94117647 0.94705882 0.93235294 0.94411765 0.94705882
0.93529412 0.94117647 0.94721408 0.94134897]
mean value: 0.9412092461618078
key: test_fscore
value: [0.91891892 0.97435897 0.87179487 0.92307692 0.92307692 0.70588235
0.94736842 0.9 0.78947368 0.88235294]
mean value: 0.8836304010607416
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93567251 0.9408284 0.94674556 0.93294461 0.9439528 0.94736842
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9412562188544396
key: test_precision
value: [0.94444444 0.95 0.85 0.9 0.9 0.8
0.94736842 0.85714286 0.78947368 0.9375 ]
mean value: 0.8875929406850459
key: train_precision
value: [0.93023256 0.94642857 0.95238095 0.92485549 0.94674556 0.94186047
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9407553480125959
key: test_recall
value: [0.89473684 1. 0.89473684 0.94736842 0.94736842 0.63157895
0.94736842 0.94736842 0.78947368 0.83333333]
mean value: 0.8833333333333333
key: train_recall
value: [0.94117647 0.93529412 0.94117647 0.94117647 0.94117647 0.95294118
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9417991056071551
key: test_roc_auc
value: [0.92105263 0.97368421 0.86842105 0.92105263 0.92105263 0.73684211
0.94736842 0.89473684 0.78362573 0.89035088]
mean value: 0.8858187134502924
key: train_roc_auc
value: [0.93529412 0.94117647 0.94705882 0.93235294 0.94411765 0.94705882
0.93529412 0.94117647 0.94721362 0.94134847]
mean value: 0.9412091503267974
key: test_jcc
value: [0.85 0.95 0.77272727 0.85714286 0.85714286 0.54545455
0.9 0.81818182 0.65217391 0.78947368]
mean value: 0.7992296947903355
key: train_jcc
value: [0.87912088 0.88826816 0.8988764 0.87431694 0.89385475 0.9
0.87845304 0.88888889 0.89944134 0.88950276]
mean value: 0.8890723159309888
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27244306 0.33298659 0.28044224 0.39420319 0.54344702 0.35148716
0.35520601 0.30538011 0.34263945 0.27838564]
mean value: 0.3456620454788208
key: score_time
value: [0.02331948 0.01721334 0.02358603 0.02463841 0.02457571 0.02402425
0.02256465 0.02319074 0.02490997 0.02106881]
mean value: 0.022909140586853026
key: test_mcc
value: [0.84327404 0.9486833 0.78947368 0.79388419 0.84327404 0.48454371
0.89473684 0.79388419 0.56725146 0.78764146]
mean value: 0.7746646917717809
key: train_mcc
value: [0.87064849 0.88241401 0.95300713 0.8058963 0.88825066 0.89417953
0.87058824 0.88235294 0.89442724 0.88269694]
mean value: 0.8824461478103343
key: test_accuracy
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.92105263 0.73684211
0.94736842 0.89473684 0.78378378 0.89189189]
mean value: 0.8859886201991465
key: train_accuracy
value: [0.93529412 0.94117647 0.97647059 0.90294118 0.94411765 0.94705882
0.93529412 0.94117647 0.94721408 0.94134897]
mean value: 0.9412092461618078
key: test_fscore
value: [0.91891892 0.97435897 0.89473684 0.9 0.92307692 0.70588235
0.94736842 0.9 0.78947368 0.88235294]
mean value: 0.8836169057840885
key: train_fscore
value: [0.93567251 0.9408284 0.97633136 0.90265487 0.9439528 0.94736842
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9411858248203606
key: test_precision
value: [0.94444444 0.95 0.89473684 0.85714286 0.9 0.8
0.94736842 0.85714286 0.78947368 0.9375 ]
mean value: 0.887780910609858
key: train_precision
value: [0.93023256 0.94642857 0.98214286 0.90532544 0.94674556 0.94186047
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9417785337345366
key: test_recall
value: [0.89473684 1. 0.89473684 0.94736842 0.94736842 0.63157895
0.94736842 0.94736842 0.78947368 0.83333333]
mean value: 0.8833333333333333
key: train_recall
value: [0.94117647 0.93529412 0.97058824 0.9 0.94117647 0.95294118
0.93529412 0.94117647 0.94705882 0.94152047]
mean value: 0.9406226350189199
key: test_roc_auc
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.92105263 0.73684211
0.94736842 0.89473684 0.78362573 0.89035088]
mean value: 0.8858187134502924
key: train_roc_auc
value: [0.93529412 0.94117647 0.97647059 0.90294118 0.94411765 0.94705882
0.93529412 0.94117647 0.94721362 0.94134847]
mean value: 0.9412091503267974
key: test_jcc
value: [0.85 0.95 0.80952381 0.81818182 0.85714286 0.54545455
0.9 0.81818182 0.65217391 0.78947368]
mean value: 0.7990132445738853
key: train_jcc
value: [0.87912088 0.88826816 0.95375723 0.82258065 0.89385475 0.9
0.87845304 0.88888889 0.89944134 0.88950276]
mean value: 0.8893867685519612
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03672051 0.03351378 0.03409219 0.04177117 0.0275321 0.03296661
0.07960463 0.03673577 0.03128195 0.07671475]
mean value: 0.04309334754943848
key: score_time
value: [0.01211333 0.01203656 0.01435566 0.01213312 0.01190567 0.01184034
0.0147748 0.02327275 0.01196027 0.01549888]
mean value: 0.013989138603210449
key: test_mcc
value: [0.94736842 0.83918129 0.63129316 0.78362573 0.78362573 0.73099415
0.80369958 0.83918129 0.72333935 0.83462233]
mean value: 0.7916931020080674
key: train_mcc
value: [0.85498357 0.87915298 0.89743309 0.86706827 0.88521358 0.87312888
0.86104418 0.86706827 0.85548378 0.87350983]
mean value: 0.8714086429236412
key: test_accuracy
value: [0.97297297 0.91891892 0.81081081 0.89189189 0.89189189 0.86486486
0.89189189 0.91891892 0.86111111 0.91666667]
mean value: 0.893993993993994
key: train_accuracy
value: [0.92749245 0.93957704 0.94864048 0.93353474 0.94259819 0.93655589
0.9305136 0.93353474 0.92771084 0.93674699]
mean value: 0.9356904961234667
key: test_fscore
value: [0.97297297 0.91891892 0.82051282 0.88888889 0.89473684 0.86486486
0.88235294 0.91891892 0.85714286 0.91891892]
mean value: 0.8938228944420895
key: train_fscore
value: [0.92771084 0.93975904 0.94832827 0.93373494 0.94259819 0.93655589
0.9305136 0.93333333 0.92814371 0.93655589]
mean value: 0.9357233697617179
key: test_precision
value: [0.94736842 0.89473684 0.76190476 0.88888889 0.89473684 0.88888889
1. 0.94444444 0.88235294 0.89473684]
mean value: 0.8998058872671876
key: train_precision
value: [0.92771084 0.93975904 0.95705521 0.93373494 0.93975904 0.93373494
0.92771084 0.93333333 0.92261905 0.93939394]
mean value: 0.9354811173624463
key: test_recall
value: [1. 0.94444444 0.88888889 0.88888889 0.89473684 0.84210526
0.78947368 0.89473684 0.83333333 0.94444444]
mean value: 0.8921052631578947
key: train_recall
value: [0.92771084 0.93975904 0.93975904 0.93373494 0.94545455 0.93939394
0.93333333 0.93333333 0.93373494 0.93373494]
mean value: 0.9359948886454911
key: test_roc_auc
value: [0.97368421 0.91959064 0.8128655 0.89181287 0.89181287 0.86549708
0.89473684 0.91959064 0.86111111 0.91666667]
mean value: 0.8947368421052632
key: train_roc_auc
value: [0.92749179 0.93957649 0.9486674 0.93353414 0.94260679 0.93656444
0.93052209 0.93353414 0.92771084 0.93674699]
mean value: 0.9356955093099671
key: test_jcc
value: [0.94736842 0.85 0.69565217 0.8 0.80952381 0.76190476
0.78947368 0.85 0.75 0.85 ]
mean value: 0.8103922850604772
key: train_jcc
value: [0.86516854 0.88636364 0.9017341 0.87570621 0.89142857 0.88068182
0.8700565 0.875 0.86592179 0.88068182]
mean value: 0.8792742987101834
MCC on Blind test: 0.83
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.78824234 0.96631408 0.8446548 1.35734797 1.61382103 1.39275098
1.2790935 1.1367147 1.12887931 0.96179318]
mean value: 1.1469611883163453
key: score_time
value: [0.01487708 0.01531816 0.01532412 0.01569772 0.0157392 0.01540256
0.01546884 0.01227903 0.02109766 0.01225138]
mean value: 0.015345573425292969
key: test_mcc
value: [0.94736842 0.62280702 0.57184997 0.78362573 0.83871328 0.78362573
0.94736842 0.89181287 0.83462233 0.83462233]
mean value: 0.8056416089443539
key: train_mcc
value: [0.90339187 1. 0.9939759 0.90332238 0.98203333 0.90339187
1. 0.98189054 1. 0.99399394]
mean value: 0.9661999838560218
key: test_accuracy
value: [0.97297297 0.81081081 0.78378378 0.89189189 0.91891892 0.89189189
0.97297297 0.94594595 0.91666667 0.91666667]
mean value: 0.9022522522522523
key: train_accuracy
value: [0.95166163 1. 0.99697885 0.95166163 0.99093656 0.95166163
1. 0.99093656 1. 0.99698795]
mean value: 0.9830824809813271
key: test_fscore
value: [0.97297297 0.81081081 0.78947368 0.88888889 0.92307692 0.89473684
0.97297297 0.94736842 0.91428571 0.91891892]
mean value: 0.9033506149295623
key: train_fscore
value: [0.95151515 1. 0.99697885 0.95180723 0.99082569 0.95180723
1. 0.99088146 1. 0.99697885]
mean value: 0.9830794460313929
key: test_precision
value: [0.94736842 0.78947368 0.75 0.88888889 0.9 0.89473684
1. 0.94736842 0.94117647 0.89473684]
mean value: 0.895374957000344
key: train_precision
value: [0.95731707 1. 1. 0.95180723 1. 0.94610778
1. 0.99390244 1. 1. ]
mean value: 0.9849134525541923
key: test_recall
value: [1. 0.83333333 0.83333333 0.88888889 0.94736842 0.89473684
0.94736842 0.94736842 0.88888889 0.94444444]
mean value: 0.9125730994152047
key: train_recall
value: [0.94578313 1. 0.9939759 0.95180723 0.98181818 0.95757576
1. 0.98787879 1. 0.9939759 ]
mean value: 0.9812814895947426
key: test_roc_auc
value: [0.97368421 0.81140351 0.78508772 0.89181287 0.91812865 0.89181287
0.97368421 0.94590643 0.91666667 0.91666667]
mean value: 0.902485380116959
key: train_roc_auc
value: [0.95167945 1. 0.99698795 0.95166119 0.99090909 0.95167945
1. 0.99092735 1. 0.99698795]
mean value: 0.9830832420591457
key: test_jcc
value: [0.94736842 0.68181818 0.65217391 0.8 0.85714286 0.80952381
0.94736842 0.9 0.84210526 0.85 ]
mean value: 0.8287500866791484
key: train_jcc
value: [0.90751445 1. 0.9939759 0.90804598 0.98181818 0.90804598
1. 0.98192771 1. 0.9939759 ]
mean value: 0.9675304104780511
MCC on Blind test: 0.69
Accuracy on Blind test: 0.84
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01437283 0.01098561 0.0105083 0.01576018 0.01028442 0.01016164
0.01007247 0.01817203 0.01193213 0.01369476]
mean value: 0.01259443759918213
key: score_time
value: [0.01252317 0.00955033 0.01525402 0.01239777 0.0093441 0.00946498
0.00936222 0.01498938 0.01097512 0.01218009]
mean value: 0.011604118347167968
key: test_mcc
value: [0.73099415 0.40469382 0.73099415 0.51319869 0.48981224 0.62280702
0.60308132 0.45906433 0.61977979 0.79772404]
mean value: 0.5972149539148133
key: train_mcc
value: [0.64442374 0.62134114 0.66921665 0.68999143 0.67585241 0.64541184
0.67034019 0.65820219 0.70948192 0.69643271]
mean value: 0.6680694215597842
key: test_accuracy
value: [0.86486486 0.7027027 0.86486486 0.75675676 0.72972973 0.81081081
0.78378378 0.72972973 0.80555556 0.88888889]
mean value: 0.7937687687687688
key: train_accuracy
value: [0.81873112 0.80966767 0.83383686 0.8429003 0.83081571 0.82175227
0.83383686 0.82779456 0.85240964 0.84638554]
mean value: 0.8318130528154916
key: test_fscore
value: [0.86486486 0.68571429 0.86486486 0.74285714 0.6875 0.81081081
0.75 0.73684211 0.78787879 0.875 ]
mean value: 0.7806332862253915
key: train_fscore
value: [0.80519481 0.80250784 0.82866044 0.8343949 0.81081081 0.81388013
0.82539683 0.81904762 0.84345048 0.83809524]
mean value: 0.8221439081547757
key: test_precision
value: [0.84210526 0.70588235 0.84210526 0.76470588 0.84615385 0.83333333
0.92307692 0.73684211 0.86666667 1. ]
mean value: 0.8360871636103834
key: train_precision
value: [0.87323944 0.83660131 0.85806452 0.88513514 0.91603053 0.84868421
0.86666667 0.86 0.89795918 0.88590604]
mean value: 0.8728287030559482
key: test_recall
value: [0.88888889 0.66666667 0.88888889 0.72222222 0.57894737 0.78947368
0.63157895 0.73684211 0.72222222 0.77777778]
mean value: 0.7403508771929824
key: train_recall
value: [0.74698795 0.77108434 0.80120482 0.78915663 0.72727273 0.78181818
0.78787879 0.78181818 0.79518072 0.79518072]
mean value: 0.7777583059510771
key: test_roc_auc
value: [0.86549708 0.70175439 0.86549708 0.75584795 0.73391813 0.81140351
0.7880117 0.72953216 0.80555556 0.88888889]
mean value: 0.7945906432748537
key: train_roc_auc
value: [0.81894852 0.80978459 0.83393574 0.84306316 0.83050383 0.82163198
0.83369843 0.82765608 0.85240964 0.84638554]
mean value: 0.831801752464403
key: test_jcc
value: [0.76190476 0.52173913 0.76190476 0.59090909 0.52380952 0.68181818
0.6 0.58333333 0.65 0.77777778]
mean value: 0.6453196561892214
key: train_jcc
value: [0.67391304 0.67015707 0.70744681 0.71584699 0.68181818 0.68617021
0.7027027 0.69354839 0.72928177 0.72131148]
mean value: 0.6982196642336499
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01818132 0.01185274 0.01171231 0.0114913 0.01783705 0.01429081
0.0138998 0.013304 0.01724982 0.01014686]
mean value: 0.013996601104736328
key: score_time
value: [0.01515007 0.00999379 0.01022625 0.01021886 0.01386929 0.01219106
0.01203775 0.01533842 0.01011515 0.0094707 ]
mean value: 0.011861133575439452
key: test_mcc
value: [0.74044197 0.62280702 0.4633451 0.57184997 0.56934383 0.62170355
0.7888597 0.6754386 0.4472136 0.78262379]
mean value: 0.628362712085414
key: train_mcc
value: [0.73459045 0.71601738 0.79022336 0.74626648 0.74713145 0.68655466
0.70405667 0.72205184 0.77108434 0.76674551]
mean value: 0.7384722135344838
key: test_accuracy
value: [0.86486486 0.81081081 0.72972973 0.78378378 0.78378378 0.81081081
0.89189189 0.83783784 0.72222222 0.88888889]
mean value: 0.8124624624624625
key: train_accuracy
value: [0.86706949 0.85800604 0.89425982 0.87311178 0.87311178 0.8429003
0.85196375 0.86102719 0.88554217 0.88253012]
mean value: 0.8689522440214028
key: test_fscore
value: [0.87179487 0.81081081 0.73684211 0.78947368 0.8 0.82051282
0.88888889 0.84210526 0.70588235 0.88235294]
mean value: 0.8148663738756617
key: train_fscore
value: [0.86982249 0.85885886 0.89795918 0.8742515 0.87573964 0.83850932
0.85285285 0.86060606 0.88554217 0.88629738]
mean value: 0.8700439444712924
key: test_precision
value: [0.80952381 0.78947368 0.7 0.75 0.76190476 0.8
0.94117647 0.84210526 0.75 0.9375 ]
mean value: 0.8081683989385228
key: train_precision
value: [0.85465116 0.85628743 0.8700565 0.86904762 0.85549133 0.85987261
0.8452381 0.86060606 0.88554217 0.85875706]
mean value: 0.8615550031773642
key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.83333333 0.84210526 0.84210526
0.84210526 0.84210526 0.66666667 0.83333333]
mean value: 0.8257309941520468
key: train_recall
value: [0.88554217 0.86144578 0.92771084 0.87951807 0.8969697 0.81818182
0.86060606 0.86060606 0.88554217 0.91566265]
mean value: 0.8791785323110625
key: test_roc_auc
value: [0.86695906 0.81140351 0.73099415 0.78508772 0.78216374 0.80994152
0.89327485 0.8377193 0.72222222 0.88888889]
mean value: 0.8128654970760234
key: train_roc_auc
value: [0.86701351 0.85799562 0.89415845 0.87309237 0.87318364 0.84282585
0.85198978 0.86102592 0.88554217 0.88253012]
mean value: 0.8689357429718876
key: test_jcc
value: [0.77272727 0.68181818 0.58333333 0.65217391 0.66666667 0.69565217
0.8 0.72727273 0.54545455 0.78947368]
mean value: 0.6914572498439775
key: train_jcc
value: [0.76963351 0.75263158 0.81481481 0.77659574 0.77894737 0.72192513
0.7434555 0.75531915 0.79459459 0.79581152]
mean value: 0.77037289076449
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00932622 0.01532483 0.01057291 0.01194143 0.01086783 0.01041532
0.01080632 0.01105237 0.01040554 0.01048732]
mean value: 0.011120009422302245
key: score_time
value: [0.01911092 0.01833034 0.021945 0.01794815 0.01830196 0.01822591
0.01857352 0.01737976 0.01754856 0.01728535]
mean value: 0.018464946746826173
key: test_mcc
value: [0.46019501 0.63129316 0.07739329 0.51461988 0.35558302 0.51461988
0.57184997 0.25301653 0.3354102 0.50709255]
mean value: 0.42210735008522005
key: train_mcc
value: [0.69212796 0.66871448 0.71631061 0.65571257 0.67976195 0.69792238
0.64957827 0.65571257 0.67513995 0.66308388]
mean value: 0.6754064609832358
key: test_accuracy
value: [0.72972973 0.81081081 0.54054054 0.75675676 0.67567568 0.75675676
0.78378378 0.62162162 0.66666667 0.75 ]
mean value: 0.7092342342342343
key: train_accuracy
value: [0.84592145 0.83383686 0.85800604 0.82779456 0.83987915 0.8489426
0.82477341 0.82779456 0.8373494 0.8313253 ]
mean value: 0.8375623339278564
key: test_fscore
value: [0.70588235 0.82051282 0.48484848 0.75675676 0.71428571 0.75675676
0.77777778 0.58823529 0.64705882 0.76923077]
mean value: 0.7021345550757315
key: train_fscore
value: [0.84866469 0.82972136 0.86053412 0.82674772 0.83890578 0.84756098
0.82317073 0.82882883 0.84023669 0.83431953]
mean value: 0.8378690419889865
key: test_precision
value: [0.75 0.76190476 0.53333333 0.73684211 0.65217391 0.77777778
0.82352941 0.66666667 0.6875 0.71428571]
mean value: 0.7104013684039596
key: train_precision
value: [0.83625731 0.85350318 0.84795322 0.83435583 0.84146341 0.85276074
0.82822086 0.82142857 0.8255814 0.81976744]
mean value: 0.8361291957614069
key: test_recall
value: [0.66666667 0.88888889 0.44444444 0.77777778 0.78947368 0.73684211
0.73684211 0.52631579 0.61111111 0.83333333]
mean value: 0.7011695906432749
key: train_recall
value: [0.86144578 0.80722892 0.87349398 0.81927711 0.83636364 0.84242424
0.81818182 0.83636364 0.85542169 0.84939759]
mean value: 0.8399598393574297
key: test_roc_auc
value: [0.72807018 0.8128655 0.5380117 0.75730994 0.67251462 0.75730994
0.78508772 0.62426901 0.66666667 0.75 ]
mean value: 0.7092105263157895
key: train_roc_auc
value: [0.84587441 0.83391749 0.85795911 0.82782037 0.83986857 0.84892296
0.82475356 0.82782037 0.8373494 0.8313253 ]
mean value: 0.837561153705732
key: test_jcc
value: [0.54545455 0.69565217 0.32 0.60869565 0.55555556 0.60869565
0.63636364 0.41666667 0.47826087 0.625 ]
mean value: 0.5490344751866492
key: train_jcc
value: [0.7371134 0.70899471 0.75520833 0.70466321 0.72251309 0.73544974
0.69948187 0.70769231 0.7244898 0.71573604]
mean value: 0.7211342490784889
MCC on Blind test: 0.5
Accuracy on Blind test: 0.75
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01921725 0.01870728 0.01728058 0.01826978 0.01832366 0.01756644
0.01772666 0.02263999 0.01768255 0.01821613]
mean value: 0.018563032150268555
key: score_time
value: [0.01221204 0.01207209 0.01167274 0.01201773 0.01157498 0.01153445
0.01147413 0.01173925 0.01177478 0.01164174]
mean value: 0.011771392822265626
key: test_mcc
value: [0.94736842 0.89181287 0.63129316 0.73099415 0.78362573 0.73099415
0.7888597 0.89181287 0.50709255 0.88888889]
mean value: 0.7792742482712791
key: train_mcc
value: [0.81269853 0.80669661 0.84296615 0.81269853 0.80062066 0.81283091
0.79462558 0.80074488 0.82543601 0.78313253]
mean value: 0.8092450407103096
key: test_accuracy
value: [0.97297297 0.94594595 0.81081081 0.86486486 0.89189189 0.86486486
0.89189189 0.94594595 0.75 0.94444444]
mean value: 0.8883633633633634
key: train_accuracy
value: [0.90634441 0.90332326 0.92145015 0.90634441 0.90030211 0.90634441
0.89728097 0.90030211 0.9126506 0.89156627]
mean value: 0.9045908710370182
key: test_fscore
value: [0.97297297 0.94444444 0.82051282 0.86486486 0.89473684 0.86486486
0.88888889 0.94736842 0.72727273 0.94444444]
mean value: 0.8870371291423923
key: train_fscore
value: [0.90690691 0.90419162 0.92121212 0.90690691 0.90030211 0.90690691
0.89759036 0.9009009 0.91343284 0.89156627]
mean value: 0.9049916936730755
key: test_precision
value: [0.94736842 0.94444444 0.76190476 0.84210526 0.89473684 0.88888889
0.94117647 0.94736842 0.8 0.94444444]
mean value: 0.8912437957639195
key: train_precision
value: [0.90419162 0.89880952 0.92682927 0.90419162 0.89759036 0.89880952
0.89221557 0.89285714 0.90532544 0.89156627]
mean value: 0.901238633145709
key: test_recall
value: [1. 0.94444444 0.88888889 0.88888889 0.89473684 0.84210526
0.84210526 0.94736842 0.66666667 0.94444444]
mean value: 0.8859649122807017
key: train_recall
value: [0.90963855 0.90963855 0.91566265 0.90963855 0.9030303 0.91515152
0.9030303 0.90909091 0.92168675 0.89156627]
mean value: 0.9088134355604235
key: test_roc_auc
value: [0.97368421 0.94590643 0.8128655 0.86549708 0.89181287 0.86549708
0.89327485 0.94590643 0.75 0.94444444]
mean value: 0.8888888888888888
key: train_roc_auc
value: [0.90633443 0.90330413 0.92146769 0.90633443 0.90031033 0.90637094
0.89729828 0.90032859 0.9126506 0.89156627]
mean value: 0.904596568090544
key: test_jcc
value: [0.94736842 0.89473684 0.69565217 0.76190476 0.80952381 0.76190476
0.8 0.9 0.57142857 0.89473684]
mean value: 0.8037256183938106
key: train_jcc
value: [0.82967033 0.82513661 0.85393258 0.82967033 0.81868132 0.82967033
0.81420765 0.81967213 0.84065934 0.80434783]
mean value: 0.8265648452150891
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.30559778 1.49760962 1.48370218 1.27121735 1.36745763 1.29357028
1.46157598 1.41085505 1.2677567 1.47282529]
mean value: 1.3832167863845826
key: score_time
value: [0.01324964 0.01464891 0.01291299 0.01481247 0.01527667 0.01298475
0.01542163 0.01258349 0.01508832 0.01303482]
mean value: 0.01400136947631836
key: test_mcc
value: [0.89736456 0.73099415 0.56725146 0.83918129 0.83871328 0.7888597
0.84959079 0.83918129 0.72333935 0.9459053 ]
mean value: 0.8020381173135782
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94594595 0.86486486 0.78378378 0.91891892 0.91891892 0.89189189
0.91891892 0.91891892 0.86111111 0.97222222]
mean value: 0.8995495495495496
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.86486486 0.77777778 0.91891892 0.92307692 0.88888889
0.91428571 0.91891892 0.85714286 0.97297297]
mean value: 0.8984216257900468
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.84210526 0.77777778 0.89473684 0.9 0.94117647
1. 0.94444444 0.88235294 0.94736842]
mean value: 0.9029962160302718
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 0.77777778 0.94444444 0.94736842 0.84210526
0.84210526 0.89473684 0.83333333 1. ]
mean value: 0.8970760233918128
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.86549708 0.78362573 0.91959064 0.91812865 0.89327485
0.92105263 0.91959064 0.86111111 0.97222222]
mean value: 0.9001461988304094
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.76190476 0.63636364 0.85 0.85714286 0.8
0.84210526 0.85 0.75 0.94736842]
mean value: 0.8194884939621782
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02773237 0.02290416 0.01795602 0.01657343 0.01584625 0.01707602
0.0161159 0.01845789 0.01577401 0.01874471]
mean value: 0.0187180757522583
key: score_time
value: [0.01235223 0.01075268 0.0102129 0.00925064 0.00990987 0.00948524
0.00892353 0.00983405 0.00992322 0.00996733]
mean value: 0.010061168670654297
key: test_mcc
value: [0.78362573 0.7888597 0.84834956 0.94736842 0.74044197 0.83918129
0.94736842 0.89736456 0.72333935 1. ]
mean value: 0.8515898998369181
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89189189 0.89189189 0.91891892 0.97297297 0.86486486 0.91891892
0.97297297 0.94594595 0.86111111 1. ]
mean value: 0.923948948948949
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.89473684 0.90909091 0.97297297 0.85714286 0.91891892
0.97297297 0.94444444 0.86486486 1. ]
mean value: 0.9224033671402092
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.85 1. 0.94736842 0.9375 0.94444444
1. 1. 0.84210526 1. ]
mean value: 0.941030701754386
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.94444444 0.83333333 1. 0.78947368 0.89473684
0.94736842 0.89473684 0.88888889 1. ]
mean value: 0.908187134502924
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89181287 0.89327485 0.91666667 0.97368421 0.86695906 0.91959064
0.97368421 0.94736842 0.86111111 1. ]
mean value: 0.9244152046783626
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.80952381 0.83333333 0.94736842 0.75 0.85
0.94736842 0.89473684 0.76190476 1. ]
mean value: 0.8594235588972431
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11874175 0.11397886 0.12404823 0.11526704 0.1260879 0.13378692
0.11424398 0.10551548 0.10430574 0.10480857]
mean value: 0.11607844829559326
key: score_time
value: [0.01930761 0.01774311 0.02675867 0.02725172 0.01928139 0.02496815
0.01777816 0.01777339 0.01758528 0.01737881]
mean value: 0.02058262825012207
key: test_mcc
value: [0.89736456 0.78362573 0.51461988 0.7888597 0.83871328 0.6754386
0.7888597 0.75938069 0.61977979 0.83462233]
mean value: 0.7501264258080947
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94594595 0.89189189 0.75675676 0.89189189 0.91891892 0.83783784
0.89189189 0.86486486 0.80555556 0.91666667]
mean value: 0.8722222222222222
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.88888889 0.75675676 0.89473684 0.92307692 0.84210526
0.88888889 0.84848485 0.78787879 0.91891892]
mean value: 0.8697104539209802
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.88888889 0.73684211 0.85 0.9 0.84210526
0.94117647 1. 0.86666667 0.89473684]
mean value: 0.8820416236670107
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 0.77777778 0.94444444 0.94736842 0.84210526
0.84210526 0.73684211 0.72222222 0.94444444]
mean value: 0.8646198830409356
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.89181287 0.75730994 0.89327485 0.91812865 0.8377193
0.89327485 0.86842105 0.80555556 0.91666667]
mean value: 0.872953216374269
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.8 0.60869565 0.80952381 0.85714286 0.72727273
0.8 0.73684211 0.65 0.85 ]
mean value: 0.7739477151376465
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00972319 0.00971651 0.00972152 0.00972605 0.00971293 0.00968099
0.00972724 0.00964713 0.00986457 0.00967932]
mean value: 0.009719944000244141
key: score_time
value: [0.00870895 0.00870967 0.00869751 0.00882626 0.00884151 0.00898957
0.0088141 0.00873256 0.00869012 0.00863934]
mean value: 0.008764958381652832
key: test_mcc
value: [0.56725146 0.62280702 0.18768409 0.74044197 0.57184997 0.30384671
0.83918129 0.56725146 0.52048344 0.63614643]
mean value: 0.555694382971283
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78378378 0.81081081 0.59459459 0.86486486 0.78378378 0.64864865
0.91891892 0.78378378 0.75 0.80555556]
mean value: 0.7744744744744745
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.77777778 0.81081081 0.57142857 0.87179487 0.77777778 0.62857143
0.91891892 0.78947368 0.70967742 0.77419355]
mean value: 0.7630424809032619
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.78947368 0.58823529 0.80952381 0.82352941 0.6875
0.94444444 0.78947368 0.84615385 0.92307692]
mean value: 0.7979188875280206
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.83333333 0.55555556 0.94444444 0.73684211 0.57894737
0.89473684 0.78947368 0.61111111 0.66666667]
mean value: 0.7388888888888889
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78362573 0.81140351 0.59356725 0.86695906 0.78508772 0.6505848
0.91959064 0.78362573 0.75 0.80555556]
mean value: 0.775
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.63636364 0.68181818 0.4 0.77272727 0.63636364 0.45833333
0.85 0.65217391 0.55 0.63157895]
mean value: 0.626935892101796
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.56
Accuracy on Blind test: 0.78
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.51501703 1.47666764 1.49276137 1.48433304 1.4902401 1.48782825
1.50618243 1.49330306 1.57110786 1.59888744]
mean value: 1.5116328239440917
key: score_time
value: [0.09284711 0.09556246 0.09648204 0.09837961 0.09728193 0.09940147
0.0990901 0.0914495 0.09889078 0.09455132]
mean value: 0.09639363288879395
key: test_mcc
value: [0.94736842 0.89736456 0.89181287 0.94736842 0.89181287 0.89736456
0.94736842 0.94736842 0.72333935 0.9459053 ]
mean value: 0.9037073192277006
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.94594595 0.94594595 0.97297297 0.94594595 0.94594595
0.97297297 0.97297297 0.86111111 0.97222222]
mean value: 0.950900900900901
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.94736842 0.94444444 0.97297297 0.94736842 0.94444444
0.97297297 0.97297297 0.85714286 0.97297297]
mean value: 0.9505633453001874
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.9 0.94444444 0.94736842 0.94736842 1.
1. 1. 0.88235294 0.94736842]
mean value: 0.9516271069831441
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.94444444 1. 0.94736842 0.89473684
0.94736842 0.94736842 0.83333333 1. ]
mean value: 0.9514619883040936
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.94590643 0.97368421 0.94590643 0.94736842
0.97368421 0.97368421 0.86111111 0.97222222]
mean value: 0.9514619883040936
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.9 0.89473684 0.94736842 0.9 0.89473684
0.94736842 0.94736842 0.75 0.94736842]
mean value: 0.9076315789473683
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.89117813 0.91720891 0.93513155 0.87951016 0.92528319 0.92419934
0.97259903 1.01499391 0.9730444 0.98030806]
mean value: 0.9413456678390503
key: score_time
value: [0.21520376 0.24883294 0.16850185 0.21201348 0.27353764 0.13810372
0.15924072 0.23260546 0.27458358 0.19869089]
mean value: 0.21213140487670898
key: test_mcc
value: [0.89181287 0.83918129 0.83918129 0.89736456 0.89181287 0.84959079
0.89736456 1. 0.72333935 0.9459053 ]
mean value: 0.8775552871651642
key: train_mcc
value: [0.96381759 0.95786323 0.94563709 0.96994925 0.96381495 0.95785863
0.96381495 0.95785863 0.96385542 0.95784871]
mean value: 0.9602318452069324
key: test_accuracy
value: [0.94594595 0.91891892 0.91891892 0.94594595 0.94594595 0.91891892
0.94594595 1. 0.86111111 0.97222222]
mean value: 0.9373873873873874
key: train_accuracy
value: [0.98187311 0.97885196 0.97280967 0.98489426 0.98187311 0.97885196
0.98187311 0.97885196 0.98192771 0.97891566]
mean value: 0.9800722527572525
key: test_fscore
value: [0.94444444 0.91891892 0.91891892 0.94736842 0.94736842 0.91428571
0.94444444 1. 0.85714286 0.97297297]
mean value: 0.9365865113233535
key: train_fscore
value: [0.98181818 0.9787234 0.97280967 0.98480243 0.98170732 0.97859327
0.98170732 0.97859327 0.98192771 0.97885196]
mean value: 0.9799534538436605
key: test_precision
value: [0.94444444 0.89473684 0.89473684 0.9 0.94736842 1.
1. 1. 0.88235294 0.94736842]
mean value: 0.9411007911936704/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: train_precision
value: [0.98780488 0.98773006 0.97575758 0.99386503 0.98773006 0.98765432
0.98773006 0.98765432 0.98192771 0.98181818]
mean value: 0.9859672203167147
key: test_recall
value: [0.94444444 0.94444444 0.94444444 1. 0.94736842 0.84210526
0.89473684 1. 0.83333333 1. ]
mean value: 0.9350877192982456
key: train_recall
value: [0.97590361 0.96987952 0.96987952 0.97590361 0.97575758 0.96969697
0.97575758 0.96969697 0.98192771 0.97590361]
mean value: 0.9740306681270536
key: test_roc_auc
value: [0.94590643 0.91959064 0.91959064 0.94736842 0.94590643 0.92105263
0.94736842 1. 0.86111111 0.97222222]
mean value: 0.9380116959064327
key: train_roc_auc
value: [0.9818912 0.97887915 0.97281855 0.9849215 0.98185469 0.97882439
0.98185469 0.97882439 0.98192771 0.97891566]
mean value: 0.9800711938663746
key: test_jcc
value: [0.89473684 0.85 0.85 0.9 0.9 0.84210526
0.89473684 1. 0.75 0.94736842]
mean value: 0.8828947368421053
key: train_jcc
value: [0.96428571 0.95833333 0.94705882 0.97005988 0.96407186 0.95808383
0.96407186 0.95808383 0.96449704 0.95857988]
mean value: 0.9607126051710413
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02643442 0.01237798 0.00983572 0.01009274 0.00978518 0.0098691
0.00975704 0.00971055 0.01049852 0.00966072]
mean value: 0.011802196502685547
key: score_time
value: [0.01092529 0.00903606 0.00895619 0.00881696 0.00897527 0.008919
0.00883269 0.00885367 0.008847 0.00878906]
mean value: 0.009095120429992675
key: test_mcc
value: [0.74044197 0.62280702 0.4633451 0.57184997 0.56934383 0.62170355
0.7888597 0.6754386 0.4472136 0.78262379]
mean value: 0.628362712085414
key: train_mcc
value: [0.73459045 0.71601738 0.79022336 0.74626648 0.74713145 0.68655466
0.70405667 0.72205184 0.77108434 0.76674551]
mean value: 0.7384722135344838
key: test_accuracy
value: [0.86486486 0.81081081 0.72972973 0.78378378 0.78378378 0.81081081
0.89189189 0.83783784 0.72222222 0.88888889]
mean value: 0.8124624624624625
key: train_accuracy
value: [0.86706949 0.85800604 0.89425982 0.87311178 0.87311178 0.8429003
0.85196375 0.86102719 0.88554217 0.88253012]
mean value: 0.8689522440214028
key: test_fscore
value: [0.87179487 0.81081081 0.73684211 0.78947368 0.8 0.82051282
0.88888889 0.84210526 0.70588235 0.88235294]
mean value: 0.8148663738756617
key: train_fscore
value: [0.86982249 0.85885886 0.89795918 0.8742515 0.87573964 0.83850932
0.85285285 0.86060606 0.88554217 0.88629738]
mean value: 0.8700439444712924
key: test_precision
value: [0.80952381 0.78947368 0.7 0.75 0.76190476 0.8
0.94117647 0.84210526 0.75 0.9375 ]
mean value: 0.8081683989385228
key: train_precision
value: [0.85465116 0.85628743 0.8700565 0.86904762 0.85549133 0.85987261
0.8452381 0.86060606 0.88554217 0.85875706]
mean value: 0.8615550031773642
key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.83333333 0.84210526 0.84210526
0.84210526 0.84210526 0.66666667 0.83333333]
mean value: 0.8257309941520468
key: train_recall
value: [0.88554217 0.86144578 0.92771084 0.87951807 0.8969697 0.81818182
0.86060606 0.86060606 0.88554217 0.91566265]
mean value: 0.8791785323110625
key: test_roc_auc
value: [0.86695906 0.81140351 0.73099415 0.78508772 0.78216374 0.80994152
0.89327485 0.8377193 0.72222222 0.88888889]
mean value: 0.8128654970760234
key: train_roc_auc
value: [0.86701351 0.85799562 0.89415845 0.87309237 0.87318364 0.84282585
0.85198978 0.86102592 0.88554217 0.88253012]
mean value: 0.8689357429718876
key: test_jcc
value: [0.77272727 0.68181818 0.58333333 0.65217391 0.66666667 0.69565217
0.8 0.72727273 0.54545455 0.78947368]
mean value: 0.6914572498439775
key: train_jcc
value: [0.76963351 0.75263158 0.81481481 0.77659574 0.77894737 0.72192513
0.7434555 0.75531915 0.79459459 0.79581152]
mean value: 0.77037289076449
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09763837 0.05395174 0.05793595 0.10008216 0.05730486 0.0574739
0.05634022 0.06836796 0.07902384 0.05754972]
mean value: 0.06856687068939209
key: score_time
value: [0.01159644 0.01121736 0.01094055 0.01107693 0.01040983 0.01041222
0.01046443 0.01227784 0.01092386 0.0105648 ]
mean value: 0.010988426208496094
key: test_mcc
value: [0.94736842 0.89736456 0.94721815 0.94736842 0.94736842 0.89736456
0.94736842 1. 0.72333935 1. ]
mean value: 0.9254760307581459
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.94594595 0.97297297 0.97297297 0.97297297 0.94594595
0.97297297 1. 0.86111111 1. ]
mean value: 0.9617867867867869
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.94736842 0.97142857 0.97297297 0.97297297 0.94444444
0.97297297 1. 0.86486486 1. ]
mean value: 0.9619998193682404
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.9 1. 0.94736842 1. 1.
1. 1. 0.84210526 1. ]
mean value: 0.9636842105263158
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.94444444 1. 0.94736842 0.89473684
0.94736842 1. 0.88888889 1. ]
mean value: 0.962280701754386
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.97222222 0.97368421 0.97368421 0.94736842
0.97368421 1. 0.86111111 1. ]
mean value: 0.962280701754386
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.9 0.94444444 0.94736842 0.94736842 0.89473684
0.94736842 1. 0.76190476 1. ]
mean value: 0.9290559732664996
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03597736 0.07507443 0.06862307 0.07008171 0.03527355 0.04518008
0.06671071 0.0492239 0.0749712 0.06974387]
mean value: 0.05908598899841309
key: score_time
value: [0.02051544 0.02273822 0.02037239 0.01225233 0.01217675 0.02001953
0.01223707 0.02085972 0.02115345 0.01531434]
mean value: 0.01776392459869385
key: test_mcc
value: [0.7888597 0.78764146 0.51461988 0.56725146 0.78362573 0.62280702
0.83918129 0.73020842 0.55901699 0.89442719]
mean value: 0.7087639142980873
key: train_mcc
value: [0.9577218 0.95786323 0.9577218 0.93957649 0.95772025 0.93355239
0.94563511 0.94563511 0.95180723 0.95208368]
mean value: 0.9499317074650987
key: test_accuracy
value: [0.89189189 0.89189189 0.75675676 0.78378378 0.89189189 0.81081081
0.91891892 0.86486486 0.77777778 0.94444444]
mean value: 0.8533033033033033
key: train_accuracy
value: [0.97885196 0.97885196 0.97885196 0.96978852 0.97885196 0.96676737
0.97280967 0.97280967 0.97590361 0.97590361]
mean value: 0.974939031048666
key: test_fscore
value: [0.89473684 0.88235294 0.75675676 0.77777778 0.89473684 0.81081081
0.91891892 0.87179487 0.76470588 0.94736842]
mean value: 0.8519960064851706
key: train_fscore
value: [0.97885196 0.9787234 0.97885196 0.96987952 0.9787234 0.96676737
0.97264438 0.97264438 0.97590361 0.97560976]
mean value: 0.9748599750031368
key: test_precision
value: [0.85 0.9375 0.73684211 0.77777778 0.89473684 0.83333333
0.94444444 0.85 0.8125 0.9 ]
mean value: 0.8537134502923976
key: train_precision
value: [0.98181818 0.98773006 0.98181818 0.96987952 0.98170732 0.96385542
0.97560976 0.97560976 0.97590361 0.98765432]
mean value: 0.9781586129458871
key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.77777778 0.89473684 0.78947368
0.89473684 0.89473684 0.72222222 1. ]
mean value: 0.8529239766081871
key: train_recall
value: [0.97590361 0.96987952 0.97590361 0.96987952 0.97575758 0.96969697
0.96969697 0.96969697 0.97590361 0.96385542]
mean value: 0.9716173786053304
key: test_roc_auc
value: [0.89327485 0.89035088 0.75730994 0.78362573 0.89181287 0.81140351
0.91959064 0.86403509 0.77777778 0.94444444]
mean value: 0.8533625730994152
key: train_roc_auc
value: [0.9788609 0.97887915 0.9788609 0.96978824 0.97884264 0.9667762
0.97280029 0.97280029 0.97590361 0.97590361]
mean value: 0.9749415845198979
key: test_jcc
value: [0.80952381 0.78947368 0.60869565 0.63636364 0.80952381 0.68181818
0.85 0.77272727 0.61904762 0.9 ]
mean value: 0.7477173665388769
key: train_jcc
value: [0.95857988 0.95833333 0.95857988 0.94152047 0.95833333 0.93567251
0.94674556 0.94674556 0.95294118 0.95238095]
mean value: 0.9509832665548312
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02032638 0.01103067 0.01079988 0.01054168 0.01051974 0.01051974
0.0105474 0.01089859 0.0105722 0.01062846]
mean value: 0.011638474464416505
key: score_time
value: [0.01018023 0.00987649 0.00954866 0.00943232 0.00947976 0.00946927
0.00955534 0.00968814 0.00956678 0.00959611]
mean value: 0.009639310836791991
key: test_mcc
value: [0.73020842 0.29824561 0.57184997 0.73099415 0.62280702 0.40469382
0.7888597 0.57184997 0.3721042 0.89442719]
mean value: 0.5986040048640627
key: train_mcc
value: [0.65133406 0.56567532 0.71002957 0.70459299 0.69830851 0.60223279
0.68655466 0.63906236 0.67014765 0.73493976]
mean value: 0.6662877681119577
key: test_accuracy
value: [0.86486486 0.64864865 0.78378378 0.86486486 0.81081081 0.7027027
0.89189189 0.78378378 0.66666667 0.94444444]
mean value: 0.7962462462462463
key: train_accuracy
value: [0.82477341 0.78247734 0.85498489 0.85196375 0.8489426 0.80060423
0.8429003 0.81873112 0.83433735 0.86746988]
mean value: 0.8327184872420195
key: test_fscore
value: [0.85714286 0.64864865 0.78947368 0.86486486 0.81081081 0.71794872
0.88888889 0.77777778 0.57142857 0.94117647]
mean value: 0.7868161292309899
key: train_fscore
value: [0.81875 0.77777778 0.85454545 0.84923077 0.84567901 0.79375
0.83850932 0.81132075 0.82866044 0.86746988]
mean value: 0.8285693401041992
key: test_precision
value: [0.88235294 0.63157895 0.75 0.84210526 0.83333333 0.7
0.94117647 0.82352941 0.8 1. ]
mean value: 0.820407636738906
key: train_precision
value: [0.85064935 0.79746835 0.8597561 0.86792453 0.86163522 0.81935484
0.85987261 0.84313725 0.85806452 0.86746988]
mean value: 0.848533265179209
key: test_recall
value: [0.83333333 0.66666667 0.83333333 0.88888889 0.78947368 0.73684211
0.84210526 0.73684211 0.44444444 0.88888889]
mean value: 0.7660818713450293
key: train_recall
value: [0.78915663 0.75903614 0.84939759 0.8313253 0.83030303 0.76969697
0.81818182 0.78181818 0.80120482 0.86746988]
mean value: 0.8097590361445783
key: test_roc_auc
value: [0.86403509 0.64912281 0.78508772 0.86549708 0.81140351 0.70175439
0.89327485 0.78508772 0.66666667 0.94444444]
mean value: 0.7966374269005848
key: train_roc_auc
value: [0.82488134 0.78254838 0.85500183 0.85202629 0.84888645 0.80051114
0.84282585 0.81861993 0.83433735 0.86746988]
mean value: 0.832710843373494
key: test_jcc
value: [0.75 0.48 0.65217391 0.76190476 0.68181818 0.56
0.8 0.63636364 0.4 0.88888889]
mean value: 0.6611149382018947
key: train_jcc
value: [0.69312169 0.63636364 0.74603175 0.73796791 0.73262032 0.65803109
0.72192513 0.68253968 0.70744681 0.76595745]
mean value: 0.7082005470442766
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01300788 0.01797628 0.01849413 0.01570034 0.01609373 0.02065372
0.02061057 0.01798749 0.01872349 0.01886225]
mean value: 0.017810988426208495
key: score_time
value: [0.00886083 0.01124477 0.0111165 0.01169562 0.01177621 0.0118463
0.01175761 0.0117023 0.01185799 0.01174355]
mean value: 0.01136016845703125
key: test_mcc
value: [0.80369958 0.51793973 0.56725146 0.73020842 0.78362573 0.78362573
0.84959079 0.83918129 0.79772404 0.73246702]
mean value: 0.7405313784220269
key: train_mcc
value: [0.85241016 0.90101455 0.93492806 0.76178654 0.87390869 0.91019063
0.93436201 0.89441747 0.82781591 0.90978714]
mean value: 0.8800621170623023
key: test_accuracy
value: [0.89189189 0.75675676 0.78378378 0.86486486 0.89189189 0.89189189
0.91891892 0.91891892 0.88888889 0.86111111]
mean value: 0.8668918918918919
key: train_accuracy
value: [0.9244713 0.94864048 0.96676737 0.87009063 0.93655589 0.95468278
0.96676737 0.94561934 0.90662651 0.95481928]
mean value: 0.9375040949295672
key: test_fscore
value: [0.9 0.72727273 0.77777778 0.85714286 0.89473684 0.89473684
0.91428571 0.91891892 0.875 0.87179487]
mean value: 0.8631666551403394
key: train_fscore
value: [0.92795389 0.94637224 0.96594427 0.85324232 0.93768546 0.95548961
0.96594427 0.94303797 0.89700997 0.95440729]
mean value: 0.9347087306426057
key: test_precision
value: [0.81818182 0.8 0.77777778 0.88235294 0.89473684 0.89473684
1. 0.94444444 1. 0.80952381]
mean value: 0.8821754475314847
key: train_precision
value: [0.88950276 0.99337748 0.99363057 0.98425197 0.91860465 0.93604651
0.98734177 0.98675497 1. 0.96319018]
mean value: 0.9652700873506086
key: test_recall
value: [1. 0.66666667 0.77777778 0.83333333 0.89473684 0.89473684
0.84210526 0.89473684 0.77777778 0.94444444]
mean value: 0.8526315789473684
key: train_recall
value: [0.96987952 0.90361446 0.93975904 0.75301205 0.95757576 0.97575758
0.94545455 0.9030303 0.81325301 0.94578313]
mean value: 0.9107119386637459
key: test_roc_auc
value: [0.89473684 0.75438596 0.78362573 0.86403509 0.89181287 0.89181287
0.92105263 0.91959064 0.88888889 0.86111111]
mean value: 0.8671052631578947
key: train_roc_auc
value: [0.9243337 0.94877693 0.96684922 0.87044542 0.9366192 0.95474626
0.96670318 0.94549106 0.90662651 0.95481928]
mean value: 0.9375410733844468
key: test_jcc
value: [0.81818182 0.57142857 0.63636364 0.75 0.80952381 0.80952381
0.84210526 0.85 0.77777778 0.77272727]
mean value: 0.763763195868459
key: train_jcc
value: [0.8655914 0.89820359 0.93413174 0.74404762 0.88268156 0.91477273
0.93413174 0.89221557 0.81325301 0.9127907 ]
mean value: 0.8791819652868769
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01939464 0.01617098 0.01889229 0.01758027 0.01637602 0.01791692
0.01738191 0.03996897 0.0171361 0.01589084]
mean value: 0.019670891761779784
key: score_time
value: [0.01182365 0.01187801 0.01180458 0.01176167 0.01184773 0.02823496
0.01288629 0.01227355 0.01188588 0.01173544]
mean value: 0.013613176345825196
key: test_mcc
value: [0.7163504 0.51478965 0.62280702 0.51121719 0.78362573 0.84959079
0.59234888 0.78362573 0.78262379 0.78262379]
mean value: 0.693960296854044
key: train_mcc
value: [0.84650478 0.75381029 0.95278173 0.57559806 0.91540708 0.92917693
0.63638198 0.90359336 0.91893658 0.87618491]
mean value: 0.8308375712756191
key: test_accuracy
value: [0.83783784 0.72972973 0.81081081 0.7027027 0.89189189 0.91891892
0.75675676 0.89189189 0.88888889 0.88888889]
mean value: 0.8318318318318318
key: train_accuracy
value: [0.918429 0.86404834 0.97583082 0.74924471 0.95770393 0.96374622
0.78851964 0.95166163 0.95783133 0.93674699]
mean value: 0.9063762603283223
key: test_fscore
value: [0.85714286 0.77272727 0.81081081 0.76595745 0.89473684 0.91428571
0.68965517 0.89473684 0.88235294 0.89473684]
mean value: 0.8377142741681218
key: train_fscore
value: [0.92436975 0.88 0.97530864 0.8 0.95757576 0.9625
0.73076923 0.95209581 0.95597484 0.93913043]
mean value: 0.9077724464152594
key: test_precision
value: [0.75 0.65384615 0.78947368 0.62068966 0.89473684 1.
1. 0.89473684 0.9375 0.85 ]
mean value: 0.839098317743962
key: train_precision
value: [0.86387435 0.78947368 1. 0.66666667 0.95757576 0.99354839
1. 0.9408284 1. 0.90502793]
mean value: 0.9116995176427221
key: test_recall
value: [1. 0.94444444 0.83333333 1. 0.89473684 0.84210526
0.52631579 0.89473684 0.83333333 0.94444444]
mean value: 0.8713450292397661
key: train_recall
value: [0.9939759 0.9939759 0.95180723 1. 0.95757576 0.93333333
0.57575758 0.96363636 0.91566265 0.97590361]
mean value: 0.926162833150785
key: test_roc_auc
value: [0.84210526 0.73538012 0.81140351 0.71052632 0.89181287 0.92105263
0.76315789 0.89181287 0.88888889 0.88888889]
mean value: 0.8345029239766082
key: train_roc_auc
value: [0.91820007 0.86365462 0.97590361 0.74848485 0.95770354 0.96365462
0.78787879 0.9516977 0.95783133 0.93674699]
mean value: 0.9061756115370574
key: test_jcc
value: [0.75 0.62962963 0.68181818 0.62068966 0.80952381 0.84210526
0.52631579 0.80952381 0.78947368 0.80952381]
mean value: 0.7268603632033759
key: train_jcc
value: [0.859375 0.78571429 0.95180723 0.66666667 0.91860465 0.92771084
0.57575758 0.90857143 0.91566265 0.8852459 ]
mean value: 0.8395116232403658
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.15993023 0.14477468 0.14659452 0.14503288 0.14535022 0.14776993
0.14721227 0.14613962 0.14871216 0.1482687 ]
mean value: 0.1479785203933716
key: score_time
value: [0.01520491 0.01517415 0.0165267 0.01517439 0.01506782 0.01651335
0.01520538 0.01570082 0.01625323 0.01571679]
mean value: 0.01565375328063965
key: test_mcc
value: [1. 0.83918129 0.94721815 0.94736842 0.94736842 0.94736842
0.94736842 0.94736842 0.78262379 1. ]
mean value: 0.9305865333163313
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.91891892 0.97297297 0.97297297 0.97297297 0.97297297
0.97297297 0.97297297 0.88888889 1. ]
mean value: 0.9645645645645646
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.91891892 0.97142857 0.97297297 0.97297297 0.97297297
0.97297297 0.97297297 0.89473684 1. ]
mean value: 0.9649949197317619
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89473684 1. 0.94736842 1. 1.
1. 1. 0.85 1. ]
mean value: 0.9692105263157895
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94444444 0.94444444 1. 0.94736842 0.94736842
0.94736842 0.94736842 0.94444444 1. ]
mean value: 0.962280701754386
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.91959064 0.97222222 0.97368421 0.97368421 0.97368421
0.97368421 0.97368421 0.88888889 1. ]
mean value: 0.9649122807017544
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.85 0.94444444 0.94736842 0.94736842 0.94736842
0.94736842 0.94736842 0.80952381 1. ]
mean value: 0.9340810359231412
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04762053 0.05938816 0.05389047 0.07260513 0.06046724 0.05754685
0.05538869 0.05734062 0.06089234 0.06812143]
mean value: 0.05932614803314209
key: score_time
value: [0.01756454 0.02058482 0.03347611 0.03177166 0.02301621 0.04879737
0.0274539 0.0281961 0.03790355 0.03870583]
mean value: 0.030747008323669434
key: test_mcc
value: [0.94736842 0.84959079 1. 0.94736842 0.89181287 0.89736456
0.94736842 0.89736456 0.83462233 1. ]
mean value: 0.9212860370101059
key: train_mcc
value: [0.9939759 0.9818912 0.98203528 0.9939759 0.9879153 0.98189054
0.98189054 0.9879153 0.99399394 0.98194553]
mean value: 0.9867429430855921
key: test_accuracy
value: [0.97297297 0.91891892 1. 0.97297297 0.94594595 0.94594595
0.97297297 0.94594595 0.91666667 1. ]
mean value: 0.9592342342342343
key: train_accuracy
value: [0.99697885 0.99093656 0.99093656 0.99697885 0.9939577 0.99093656
0.99093656 0.9939577 0.99698795 0.99096386]
mean value: 0.9933571142576347
key: test_fscore
value: [0.97297297 0.92307692 1. 0.97297297 0.94736842 0.94444444
0.97297297 0.94444444 0.91891892 1. ]
mean value: 0.9597172070856281
key: train_fscore
value: [0.99697885 0.99093656 0.99088146 0.99697885 0.99393939 0.99088146
0.99088146 0.99393939 0.99697885 0.99093656]
mean value: 0.99333328324522
key: test_precision
value: [0.94736842 0.85714286 1. 0.94736842 0.94736842 1.
1. 1. 0.89473684 1. ]
mean value: 0.9593984962406015
key: train_precision
value: [1. 0.99393939 1. 1. 0.99393939 0.99390244
0.99390244 0.99393939 1. 0.99393939]
mean value: 0.9963562453806356
key: test_recall
value: [1. 1. 1. 1. 0.94736842 0.89473684
0.94736842 0.89473684 0.94444444 1. ]
mean value: 0.9628654970760234
key: train_recall
value: [0.9939759 0.98795181 0.98192771 0.9939759 0.99393939 0.98787879
0.98787879 0.99393939 0.9939759 0.98795181]
mean value: 0.9903395399780942
key: test_roc_auc
value: [0.97368421 0.92105263 1. 0.97368421 0.94590643 0.94736842
0.97368421 0.94736842 0.91666667 1. ]
mean value: 0.9599415204678362
key: train_roc_auc
value: [0.99698795 0.9909456 0.99096386 0.99698795 0.99395765 0.99092735
0.99092735 0.99395765 0.99698795 0.99096386]
mean value: 0.9933607155896312
key: test_jcc
value: [0.94736842 0.85714286 1. 0.94736842 0.9 0.89473684
0.94736842 0.89473684 0.85 1. ]
mean value: 0.9238721804511278
key: train_jcc
value: [0.9939759 0.98203593 0.98192771 0.9939759 0.98795181 0.98192771
0.98192771 0.98795181 0.9939759 0.98203593]
mean value: 0.9867686314118751
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.07490635 0.11946273 0.103163 0.10210061 0.1119535 0.11135602
0.16045046 0.09157825 0.10720015 0.11908531]
mean value: 0.11012563705444336
key: score_time
value: [0.02171206 0.02200747 0.02642679 0.02162862 0.02213502 0.02532601
0.02199364 0.02181697 0.02197194 0.02154231]
mean value: 0.02265608310699463
key: test_mcc
value: [0.56725146 0.6754386 0.35104619 0.63129316 0.4633451 0.56725146
0.62280702 0.60308132 0.5007734 0.78262379]
mean value: 0.5764911497325073
key: train_mcc
value: [0.9939759 0.9939759 0.9939759 0.9939759 0.99397568 1.
1. 0.99397568 0.99399394 0.99399394]
mean value: 0.9951842862455198
key: test_accuracy
value: [0.78378378 0.83783784 0.67567568 0.81081081 0.72972973 0.78378378
0.81081081 0.78378378 0.75 0.88888889]
mean value: 0.7855105105105105
key: train_accuracy
value: [0.99697885 0.99697885 0.99697885 0.99697885 0.99697885 1.
1. 0.99697885 0.99698795 0.99698795]
mean value: 0.9975849015396935
key: test_fscore
value: [0.77777778 0.83333333 0.64705882 0.82051282 0.72222222 0.78947368
0.81081081 0.75 0.75675676 0.88235294]
mean value: 0.779029917033013
key: train_fscore
value: [0.99697885 0.99697885 0.99697885 0.99697885 0.99696049 1.
1. 0.99696049 0.99697885 0.99697885]
mean value: 0.9975794084426854
key: test_precision
value: [0.77777778 0.83333333 0.6875 0.76190476 0.76470588 0.78947368
0.83333333 0.92307692 0.73684211 0.9375 ]
mean value: 0.8045447801252755
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.83333333 0.61111111 0.88888889 0.68421053 0.78947368
0.78947368 0.63157895 0.77777778 0.83333333]
mean value: 0.7616959064327485
key: train_recall
value: [0.9939759 0.9939759 0.9939759 0.9939759 0.99393939 1.
1. 0.99393939 0.9939759 0.9939759 ]
mean value: 0.9951734209565535
key: test_roc_auc
value: [0.78362573 0.8377193 0.67397661 0.8128655 0.73099415 0.78362573
0.81140351 0.7880117 0.75 0.88888889]
mean value: 0.7861111111111111
key: train_roc_auc
value: [0.99698795 0.99698795 0.99698795 0.99698795 0.9969697 1.
1. 0.9969697 0.99698795 0.99698795]
mean value: 0.9975867104782767
key: test_jcc
value: [0.63636364 0.71428571 0.47826087 0.69565217 0.56521739 0.65217391
0.68181818 0.6 0.60869565 0.78947368]
mean value: 0.6421941216678059
key: train_jcc
value: [0.9939759 0.9939759 0.9939759 0.9939759 0.99393939 1.
1. 0.99393939 0.9939759 0.9939759 ]
mean value: 0.9951734209565535
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.56603932 0.55516958 0.56111884 0.5529871 0.55033922 0.55430651
0.53636289 0.54783416 0.5531528 0.55002046]
mean value: 0.5527330875396729
key: score_time
value: [0.01002455 0.00995398 0.00930357 0.00971341 0.01047468 0.00944734
0.00963187 0.00966716 0.00973773 0.00942183]
mean value: 0.009737610816955566
key: test_mcc
value: [0.89181287 0.7888597 1. 0.94736842 0.89181287 0.94736842
0.94736842 1. 0.78262379 1. ]
mean value: 0.9197214483309992
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94594595 0.89189189 1. 0.97297297 0.94594595 0.97297297
0.97297297 1. 0.88888889 1. ]
mean value: 0.9591591591591592
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94444444 0.89473684 1. 0.97297297 0.94736842 0.97297297
0.97297297 1. 0.89473684 1. ]
mean value: 0.9600205468626521
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94444444 0.85 1. 0.94736842 0.94736842 1.
1. 1. 0.85 1. ]
mean value: 0.9539181286549707
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94444444 0.94444444 1. 1. 0.94736842 0.94736842
0.94736842 1. 0.94444444 1. ]
mean value: 0.9675438596491228
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94590643 0.89327485 1. 0.97368421 0.94590643 0.97368421
0.97368421 1. 0.88888889 1. ]
mean value: 0.9595029239766082
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89473684 0.80952381 1. 0.94736842 0.9 0.94736842
0.94736842 1. 0.80952381 1. ]
mean value: 0.9255889724310777
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02545977 0.02662921 0.03351521 0.02726531 0.02694201 0.02765965
0.02748775 0.02727795 0.02772045 0.02956676]
mean value: 0.027952408790588378
key: score_time
value: [0.01267028 0.01618814 0.01552796 0.01530409 0.01815295 0.01537704
0.01538992 0.01563191 0.01558256 0.01613903]
mean value: 0.015596389770507812
key: test_mcc
value: [0.51319869 0.40643275 0.02932564 0.1378305 0.4163404 0.1378305
0.36315314 0.46019501 0.35355339 0.35355339]
mean value: 0.3171413424046587
key: train_mcc
value: [0.91872008 0.9939759 0.98203333 0.83781349 0.96374589 0.80665108
0.96437604 0.9818912 0.97618706 0.86450473]
mean value: 0.9289898789692037
key: test_accuracy
value: [0.75675676 0.7027027 0.51351351 0.56756757 0.7027027 0.56756757
0.67567568 0.72972973 0.66666667 0.66666667]
mean value: 0.6549549549549549
key: train_accuracy
value: [0.95770393 0.99697885 0.99093656 0.91238671 0.98187311 0.89425982
0.98187311 0.99093656 0.98795181 0.92771084]
mean value: 0.9622611291085793
key: test_fscore
value: [0.74285714 0.7027027 0.52631579 0.57894737 0.74418605 0.55555556
0.64705882 0.75 0.71428571 0.71428571]
mean value: 0.6676194857622606
key: train_fscore
value: [0.95597484 0.99697885 0.99104478 0.90429043 0.98181818 0.88135593
0.98148148 0.99093656 0.98809524 0.93258427]
mean value: 0.96045605590458
key: test_precision
value: [0.76470588 0.68421053 0.5 0.55 0.66666667 0.58823529
0.73333333 0.71428571 0.625 0.625 ]
mean value: 0.6451437417072092
key: train_precision
value: [1. 1. 0.98224852 1. 0.98181818 1.
1. 0.98795181 0.97647059 0.87368421]
mean value: 0.9802173308518767
key: test_recall
value: [0.72222222 0.72222222 0.55555556 0.61111111 0.84210526 0.52631579
0.57894737 0.78947368 0.83333333 0.83333333]
mean value: 0.7014619883040936
key: train_recall
value: [0.91566265 0.9939759 1. 0.8253012 0.98181818 0.78787879
0.96363636 0.99393939 1. 1. ]
mean value: 0.9462212486308872
key: test_roc_auc
value: [0.75584795 0.70321637 0.51461988 0.56871345 0.69883041 0.56871345
0.67836257 0.72807018 0.66666667 0.66666667]
mean value: 0.6549707602339181
key: train_roc_auc
value: [0.95783133 0.99698795 0.99090909 0.9126506 0.98187295 0.89393939
0.98181818 0.9909456 0.98795181 0.92771084]
mean value: 0.9622617743702081
key: test_jcc
value: [0.59090909 0.54166667 0.35714286 0.40740741 0.59259259 0.38461538
0.47826087 0.6 0.55555556 0.55555556]
mean value: 0.5063705980010328
key: train_jcc
value: [0.91566265 0.9939759 0.98224852 0.8253012 0.96428571 0.78787879
0.96363636 0.98203593 0.97647059 0.87368421]
mean value: 0.9265179872452391
MCC on Blind test: 0.5
Accuracy on Blind test: 0.75
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02361274 0.05441642 0.03723788 0.04656839 0.03769898 0.03737879
0.04589653 0.03744817 0.03729868 0.03707409]
mean value: 0.03946306705474854
key: score_time
value: [0.02177238 0.02057743 0.02044559 0.02325034 0.01888037 0.02219844
0.0233705 0.0234704 0.0239017 0.02041125]
mean value: 0.021827840805053712
key: test_mcc
value: [0.94736842 0.73821295 0.56725146 0.73099415 0.78362573 0.73099415
0.84959079 0.89181287 0.72333935 0.83462233]
mean value: 0.7797812196768581
key: train_mcc
value: [0.88520939 0.89729828 0.91547702 0.87940108 0.91547085 0.89729828
0.89729828 0.88521358 0.89182522 0.89156627]
mean value: 0.8956058253044522
key: test_accuracy
value: [0.97297297 0.86486486 0.78378378 0.86486486 0.89189189 0.86486486
0.91891892 0.94594595 0.86111111 0.91666667]
mean value: 0.8885885885885886
key: train_accuracy
value: [0.94259819 0.94864048 0.95770393 0.93957704 0.95770393 0.94864048
0.94864048 0.94259819 0.94578313 0.94578313]
mean value: 0.9477668984093474
key: test_fscore
value: [0.97297297 0.84848485 0.77777778 0.86486486 0.89473684 0.86486486
0.91428571 0.94736842 0.85714286 0.91891892]
mean value: 0.8861418082470714
key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
value: [0.94294294 0.94864048 0.95757576 0.94047619 0.95731707 0.94864048
0.94864048 0.94259819 0.94642857 0.94578313]
mean value: 0.947904330558655
key: test_precision
value: [0.94736842 0.93333333 0.77777778 0.84210526 0.89473684 0.88888889
1. 0.94736842 0.88235294 0.89473684]
mean value: 0.9008668730650154
key: train_precision
value: [0.94011976 0.95151515 0.96341463 0.92941176 0.96319018 0.94578313
0.94578313 0.93975904 0.93529412 0.94578313]
mean value: 0.9460054046277495
key: test_recall
value: [1. 0.77777778 0.77777778 0.88888889 0.89473684 0.84210526
0.84210526 0.94736842 0.83333333 0.94444444]
mean value: 0.8748538011695907
key: train_recall
value: [0.94578313 0.94578313 0.95180723 0.95180723 0.95151515 0.95151515
0.95151515 0.94545455 0.95783133 0.94578313]
mean value: 0.9498795180722892
key: test_roc_auc
value: [0.97368421 0.8625731 0.78362573 0.86549708 0.89181287 0.86549708
0.92105263 0.94590643 0.86111111 0.91666667]
mean value: 0.8887426900584795
key: train_roc_auc
value: [0.94258854 0.94864914 0.9577218 0.93953998 0.95768529 0.94864914
0.94864914 0.94260679 0.94578313 0.94578313]
mean value: 0.9477656078860899
key: test_jcc
value: [0.94736842 0.73684211 0.63636364 0.76190476 0.80952381 0.76190476
0.84210526 0.9 0.75 0.85 ]
mean value: 0.7996012759170654
key: train_jcc
value: [0.89204545 0.90229885 0.91860465 0.88764045 0.91812865 0.90229885
0.90229885 0.89142857 0.89830508 0.89714286]
mean value: 0.9010192275158537
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25129724 0.32188392 0.33577132 0.31489706 0.26630807 0.47269154
0.30162644 0.28344464 0.26365685 0.33231711]
mean value: 0.3143894195556641
key: score_time
value: [0.01923299 0.02272201 0.02159452 0.02046084 0.01954365 0.02145052
0.02375412 0.02187276 0.01636934 0.02253842]
mean value: 0.020953917503356935
key: test_mcc
value: [0.94736842 0.73821295 0.56725146 0.73099415 0.78362573 0.73099415
0.84959079 0.89181287 0.72333935 0.83462233]
mean value: 0.7797812196768581
key: train_mcc
value: [0.88520939 0.89729828 0.91547702 0.87940108 0.91547085 0.89729828
0.89729828 0.80061339 0.89182522 0.89156627]
mean value: 0.8871458056914586
key: test_accuracy
value: [0.97297297 0.86486486 0.78378378 0.86486486 0.89189189 0.86486486
0.91891892 0.94594595 0.86111111 0.91666667]
mean value: 0.8885885885885886
key: train_accuracy
value: [0.94259819 0.94864048 0.95770393 0.93957704 0.95770393 0.94864048
0.94864048 0.90030211 0.94578313 0.94578313]
mean value: 0.9435372911585921
key: test_fscore
value: [0.97297297 0.84848485 0.77777778 0.86486486 0.89473684 0.86486486
0.91428571 0.94736842 0.85714286 0.91891892]
mean value: 0.8861418082470714
key: train_fscore
value: [0.94294294 0.94864048 0.95757576 0.94047619 0.95731707 0.94864048
0.94864048 0.89969605 0.94642857 0.94578313]
mean value: 0.9436141166907591
key: test_precision
value: [0.94736842 0.93333333 0.77777778 0.84210526 0.89473684 0.88888889
1. 0.94736842 0.88235294 0.89473684]
mean value: 0.9008668730650154
key: train_precision
value: [0.94011976 0.95151515 0.96341463 0.92941176 0.96319018 0.94578313
0.94578313 0.90243902 0.93529412 0.94578313]
mean value: 0.9422734034523161
key: test_recall
value: [1. 0.77777778 0.77777778 0.88888889 0.89473684 0.84210526
0.84210526 0.94736842 0.83333333 0.94444444]
mean value: 0.8748538011695907
key: train_recall
value: [0.94578313 0.94578313 0.95180723 0.95180723 0.95151515 0.95151515
0.95151515 0.8969697 0.95783133 0.94578313]
mean value: 0.9450310332238043
key: test_roc_auc
value: [0.97368421 0.8625731 0.78362573 0.86549708 0.89181287 0.86549708
0.92105263 0.94590643 0.86111111 0.91666667]
mean value: 0.8887426900584795
key: train_roc_auc
value: [0.94258854 0.94864914 0.9577218 0.93953998 0.95768529 0.94864914
0.94864914 0.90029208 0.94578313 0.94578313]
mean value: 0.9435341365461848
key: test_jcc
value: [0.94736842 0.73684211 0.63636364 0.76190476 0.80952381 0.76190476
0.84210526 0.9 0.75 0.85 ]
mean value: 0.7996012759170654
key: train_jcc
value: [0.89204545 0.90229885 0.91860465 0.88764045 0.91812865 0.90229885
0.90229885 0.81767956 0.89830508 0.89714286]
mean value: 0.8936443261741015
MCC on Blind test: 0.79
Accuracy on Blind test: 0.9
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03955388 0.03381324 0.03329587 0.05509424 0.05626059 0.05854964
0.05459762 0.06159639 0.03439593 0.03141403]
mean value: 0.04585714340209961
key: score_time
value: [0.01206017 0.01217222 0.01196647 0.01265693 0.0123384 0.01210618
0.01433325 0.01214552 0.01429367 0.01195979]
mean value: 0.012603259086608887
key: test_mcc
value: [0.73786479 0.84327404 0.68421053 0.84327404 0.84327404 0.73786479
0.89973541 0.84327404 0.89736456 0.51319869]
mean value: 0.7843334932834807
key: train_mcc
value: [0.85906136 0.87648575 0.85888297 0.84119102 0.87660709 0.84705882
0.87058824 0.85888297 0.85929061 0.85930029]
mean value: 0.8607349129613298
key: test_accuracy
value: [0.86842105 0.92105263 0.84210526 0.92105263 0.92105263 0.86842105
0.94736842 0.92105263 0.94594595 0.75675676]
mean value: 0.8913229018492176
key: train_accuracy
value: [0.92941176 0.93823529 0.92941176 0.92058824 0.93823529 0.92352941
0.93529412 0.92941176 0.92961877 0.92961877]
mean value: 0.9303355183715715
key: test_fscore
value: [0.87179487 0.92307692 0.84210526 0.91891892 0.91891892 0.87179487
0.94444444 0.91891892 0.94736842 0.76923077]
mean value: 0.8926572321309163
key: train_fscore
value: [0.93023256 0.93841642 0.92899408 0.92035398 0.93877551 0.92352941
0.93529412 0.92982456 0.93023256 0.92982456]
mean value: 0.9305477766130446
key: test_precision
value: [0.85 0.9 0.84210526 0.94444444 0.94444444 0.85
1. 0.94444444 0.9 0.75 ]
mean value: 0.8925438596491228
key: train_precision
value: [0.91954023 0.93567251 0.93452381 0.92307692 0.93063584 0.92352941
0.93529412 0.9244186 0.92485549 0.9244186 ]
mean value: 0.9275965545299533
key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.89473684 0.89473684 0.89473684
0.89473684 0.89473684 1. 0.78947368]
mean value: 0.8947368421052632
key: train_recall
value: [0.94117647 0.94117647 0.92352941 0.91764706 0.94705882 0.92352941
0.93529412 0.93529412 0.93567251 0.93529412]
mean value: 0.9335672514619883
key: test_roc_auc
value: [0.86842105 0.92105263 0.84210526 0.92105263 0.92105263 0.86842105
0.94736842 0.92105263 0.94736842 0.75584795]
mean value: 0.8913742690058479
key: train_roc_auc
value: [0.92941176 0.93823529 0.92941176 0.92058824 0.93823529 0.92352941
0.93529412 0.92941176 0.92960096 0.92963536]
mean value: 0.9303353973168215
key: test_jcc
value: [0.77272727 0.85714286 0.72727273 0.85 0.85 0.77272727
0.89473684 0.85 0.9 0.625 ]
mean value: 0.8099606971975393
key: train_jcc
value: [0.86956522 0.8839779 0.86740331 0.85245902 0.88461538 0.8579235
0.87845304 0.86885246 0.86956522 0.86885246]
mean value: 0.8701667505235628
MCC on Blind test: 0.83
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.85755229 0.90421176 1.06443739 0.84898424 1.10972214 1.86754012
1.32581997 1.09845734 1.02783036 1.25375056]
mean value: 1.1358306169509889
key: score_time
value: [0.01481295 0.01703072 0.01232839 0.02471399 0.01229548 0.01258135
0.01538014 0.01235461 0.01231289 0.01262498]
mean value: 0.014643549919128418
key: test_mcc
value: [0.78947368 0.79388419 0.68421053 0.89473684 0.73786479 0.73786479
0.85280287 0.84327404 0.89736456 0.51319869]
mean value: 0.7744674971633342
key: train_mcc
value: [1. 0.90001557 0.88825066 0.87648575 0.82942611 0.98236994
0.88235294 0.82375747 0.77139024 0.82404541]
mean value: 0.8778094100190499
key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.86842105 0.86842105
0.92105263 0.92105263 0.94594595 0.75675676]
mean value: 0.8860597439544808
key: train_accuracy
value: [1. 0.95 0.94411765 0.93823529 0.91470588 0.99117647
0.94117647 0.91176471 0.8856305 0.91202346]
mean value: 0.9388830429532516
key: test_fscore
value: [0.89473684 0.9 0.84210526 0.94736842 0.86486486 0.87179487
0.91428571 0.91891892 0.94736842 0.76923077]
mean value: 0.887067408646356
key: train_fscore
value: [1. 0.94985251 0.9439528 0.9380531 0.91445428 0.99115044
0.94117647 0.9127907 0.88495575 0.91176471]
mean value: 0.9388150753201054
key: test_precision
value: [0.89473684 0.85714286 0.84210526 0.94736842 0.88888889 0.85
1. 0.94444444 0.9 0.75 ]
mean value: 0.887468671679198
key: train_precision
value: [1. 0.95266272 0.94674556 0.9408284 0.91715976 0.99408284
0.94117647 0.90229885 0.89285714 0.91176471]
mean value: 0.9399576459843272
key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.94736842 0.84210526 0.89473684
0.84210526 0.89473684 1. 0.78947368]
mean value: 0.8894736842105263
key: train_recall
value: [1. 0.94705882 0.94117647 0.93529412 0.91176471 0.98823529
0.94117647 0.92352941 0.87719298 0.91176471]
mean value: 0.9377192982456141
key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.86842105 0.86842105
0.92105263 0.92105263 0.94736842 0.75584795]
mean value: 0.8861111111111111
key: train_roc_auc
value: [1. 0.95 0.94411765 0.93823529 0.91470588 0.99117647
0.94117647 0.91176471 0.88565531 0.9120227 ]
mean value: 0.9388854489164087
key: test_jcc
value: [0.80952381 0.81818182 0.72727273 0.9 0.76190476 0.77272727
0.84210526 0.85 0.9 0.625 ]
mean value: 0.8006715652768285
key: train_jcc
value: [1. 0.90449438 0.89385475 0.88333333 0.8423913 0.98245614
0.88888889 0.83957219 0.79365079 0.83783784]
mean value: 0.886647962154875
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01427627 0.01156187 0.01119947 0.01072907 0.01078057 0.01065731
0.01075244 0.01101494 0.01063251 0.01074815]
mean value: 0.011235260963439941
key: score_time
value: [0.0127666 0.01023865 0.01028371 0.00991011 0.00974965 0.00975084
0.00979042 0.00979543 0.00977588 0.00977397]
mean value: 0.01018352508544922
key: test_mcc
value: [0.63245553 0.54554473 0.59222009 0.59222009 0.58218174 0.37686733
0.85280287 0.63245553 0.73020842 0.62280702]
mean value: 0.6159763343462977
key: train_mcc
value: [0.67627507 0.65158377 0.68169173 0.62705429 0.65254612 0.6500365
0.62983758 0.67087425 0.64290652 0.64159047]
mean value: 0.6524396309639473
key: test_accuracy
value: [0.81578947 0.76315789 0.78947368 0.78947368 0.78947368 0.68421053
0.92105263 0.81578947 0.86486486 0.81081081]
mean value: 0.8044096728307255
key: train_accuracy
value: [0.83529412 0.82352941 0.83823529 0.81176471 0.82352941 0.81764706
0.81176471 0.83235294 0.81818182 0.81818182]
mean value: 0.8230481283422459
key: test_fscore
value: [0.81081081 0.72727273 0.76470588 0.76470588 0.77777778 0.64705882
0.91428571 0.81081081 0.85714286 0.81081081]
mean value: 0.7885382097146804
key: train_fscore
value: [0.82389937 0.8125 0.82758621 0.80124224 0.81132075 0.79605263
0.79746835 0.82018927 0.80503145 0.80503145]
mean value: 0.8100321722246597
key: test_precision
value: [0.83333333 0.85714286 0.86666667 0.86666667 0.82352941 0.73333333
1. 0.83333333 0.88235294 0.83333333]
mean value: 0.85296918767507
key: train_precision
value: [0.88513514 0.86666667 0.88590604 0.84868421 0.87162162 0.90298507
0.8630137 0.88435374 0.8707483 0.86486486]
mean value: 0.874397935315639
key: test_recall
value: [0.78947368 0.63157895 0.68421053 0.68421053 0.73684211 0.57894737
0.84210526 0.78947368 0.83333333 0.78947368]
mean value: 0.7359649122807017
key: train_recall
value: [0.77058824 0.76470588 0.77647059 0.75882353 0.75882353 0.71176471
0.74117647 0.76470588 0.74853801 0.75294118]
mean value: 0.7548538011695907
key: test_roc_auc
value: [0.81578947 0.76315789 0.78947368 0.78947368 0.78947368 0.68421053
0.92105263 0.81578947 0.86403509 0.81140351]
mean value: 0.8043859649122806
key: train_roc_auc
value: [0.83529412 0.82352941 0.83823529 0.81176471 0.82352941 0.81764706
0.81176471 0.83235294 0.81838665 0.81799106]
mean value: 0.8230495356037152
key: test_jcc
value: [0.68181818 0.57142857 0.61904762 0.61904762 0.63636364 0.47826087
0.84210526 0.68181818 0.75 0.68181818]
mean value: 0.6561708124065103
key: train_jcc
value: [0.70053476 0.68421053 0.70588235 0.66839378 0.68253968 0.66120219
0.66315789 0.69518717 0.67368421 0.67368421]
mean value: 0.6808476770895582
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01116753 0.01096296 0.01096344 0.01117587 0.01004553 0.01000977
0.01028728 0.01055145 0.0107522 0.01096797]
mean value: 0.010688400268554688
key: score_time
value: [0.0098331 0.00980783 0.00969982 0.01029825 0.00925255 0.00957799
0.00884676 0.008991 0.009202 0.00951934]
mean value: 0.009502863883972168
key: test_mcc
value: [0.52704628 0.47633051 0.63245553 0.63245553 0.73786479 0.52704628
0.9486833 0.58218174 0.89736456 0.35558302]
mean value: 0.6317011536883512
key: train_mcc
value: [0.74138173 0.74717517 0.73632672 0.72354193 0.71236887 0.72354193
0.71944168 0.72961376 0.72462581 0.75953765]
mean value: 0.7317555250285934
key: test_accuracy
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
0.97368421 0.78947368 0.94594595 0.67567568]
mean value: 0.8147937411095306
key: train_accuracy
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
0.85882353 0.86470588 0.86217009 0.8797654 ]
mean value: 0.865664136622391
key: test_fscore
value: [0.75675676 0.75 0.81081081 0.82051282 0.86486486 0.75675676
0.97297297 0.8 0.94736842 0.71428571]
mean value: 0.8194329118013328
key: train_fscore
value: [0.87209302 0.87463557 0.87106017 0.86217009 0.85878963 0.86217009
0.86363636 0.86627907 0.86455331 0.87905605]
mean value: 0.8674443359724497
key: test_precision
value: [0.77777778 0.71428571 0.83333333 0.8 0.88888889 0.77777778
1. 0.76190476 0.9 0.65217391]
mean value: 0.8106142167011732
key: train_precision
value: [0.86206897 0.86705202 0.84916201 0.85964912 0.84180791 0.85964912
0.83516484 0.85632184 0.85227273 0.8816568 ]
mean value: 0.8564805361282117
key: test_recall
value: [0.73684211 0.78947368 0.78947368 0.84210526 0.84210526 0.73684211
0.94736842 0.84210526 1. 0.78947368]
mean value: 0.831578947368421
key: train_recall
value: [0.88235294 0.88235294 0.89411765 0.86470588 0.87647059 0.86470588
0.89411765 0.87647059 0.87719298 0.87647059]
mean value: 0.8788957688338493
key: test_roc_auc
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
0.97368421 0.78947368 0.94736842 0.67251462]
mean value: 0.8146198830409357
key: train_roc_auc
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
0.85882353 0.86470588 0.8621259 0.87975576]
mean value: 0.8656587547299621
key: test_jcc
value: [0.60869565 0.6 0.68181818 0.69565217 0.76190476 0.60869565
0.94736842 0.66666667 0.9 0.55555556]
mean value: 0.7026357065258667
key: train_jcc
value: [0.77319588 0.77720207 0.7715736 0.75773196 0.75252525 0.75773196
0.76 0.76410256 0.76142132 0.78421053]
mean value: 0.7659695133154767
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01076102 0.01020241 0.01022601 0.01018453 0.01024795 0.01016283
0.01031971 0.0109694 0.01042318 0.01042962]
mean value: 0.01039266586303711
key: score_time
value: [0.01769781 0.0165956 0.01196361 0.01205826 0.01194811 0.01616716
0.01586628 0.01836443 0.01750278 0.01775217]
mean value: 0.015591621398925781
key: test_mcc
value: [0.26462806 0.05383819 0.58218174 0.53300179 0.42163702 0.53300179
0.59222009 0.47368421 0.62280702 0.40469382]
mean value: 0.44816937343834784
key: train_mcc
value: [0.65385813 0.68277833 0.68235294 0.64127633 0.65322377 0.65304287
0.62968418 0.67100629 0.66019502 0.69541138]
mean value: 0.6622829245583861
key: test_accuracy
value: [0.63157895 0.52631579 0.78947368 0.76315789 0.71052632 0.76315789
0.78947368 0.73684211 0.81081081 0.7027027 ]
mean value: 0.7224039829302987
key: train_accuracy
value: [0.82647059 0.84117647 0.84117647 0.82058824 0.82647059 0.82647059
0.81470588 0.83529412 0.82991202 0.84750733]
mean value: 0.830977229601518
key: test_fscore
value: [0.65 0.47058824 0.77777778 0.74285714 0.71794872 0.74285714
0.76470588 0.73684211 0.81081081 0.71794872]
mean value: 0.7132336533110527
key: train_fscore
value: [0.82175227 0.84393064 0.84117647 0.8189911 0.82898551 0.82492582
0.8173913 0.83233533 0.83333333 0.84431138]
mean value: 0.8307133137748363
key: test_precision
value: [0.61904762 0.53333333 0.82352941 0.8125 0.7 0.8125
0.86666667 0.73684211 0.78947368 0.7 ]
mean value: 0.7393892820286009
key: train_precision
value: [0.8447205 0.82954545 0.84117647 0.82634731 0.81714286 0.83233533
0.80571429 0.84756098 0.81920904 0.8597561 ]
mean value: 0.8323508312334535
key: test_recall
value: [0.68421053 0.42105263 0.73684211 0.68421053 0.73684211 0.68421053
0.68421053 0.73684211 0.83333333 0.73684211]
mean value: 0.6938596491228071
key: train_recall
value: [0.8 0.85882353 0.84117647 0.81176471 0.84117647 0.81764706
0.82941176 0.81764706 0.84795322 0.82941176]
mean value: 0.8295012039903681
key: test_roc_auc
value: [0.63157895 0.52631579 0.78947368 0.76315789 0.71052632 0.76315789
0.78947368 0.73684211 0.81140351 0.70175439]
mean value: 0.7223684210526315
key: train_roc_auc
value: [0.82647059 0.84117647 0.84117647 0.82058824 0.82647059 0.82647059
0.81470588 0.83529412 0.82985896 0.84745442]
mean value: 0.8309666322669419
key: test_jcc
value: [0.48148148 0.30769231 0.63636364 0.59090909 0.56 0.59090909
0.61904762 0.58333333 0.68181818 0.56 ]
mean value: 0.5611554741554742
key: train_jcc
value: [0.6974359 0.73 0.72588832 0.69346734 0.70792079 0.7020202
0.69117647 0.71282051 0.71428571 0.73056995]
mean value: 0.7105585198972811
MCC on Blind test: 0.5
Accuracy on Blind test: 0.75
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01894116 0.01715732 0.01883602 0.01850367 0.01840329 0.01838541
0.01891804 0.01576376 0.01587749 0.01550865]
mean value: 0.017629480361938475
key: score_time
value: [0.01178336 0.01053977 0.01157427 0.01156521 0.01155615 0.01190138
0.01167703 0.01049495 0.01038718 0.01023459]
mean value: 0.011171388626098632
key: test_mcc
value: [0.78947368 0.79388419 0.68421053 0.84327404 0.79388419 0.68421053
0.9486833 0.84327404 0.89736456 0.40469382]
mean value: 0.768295287708455
key: train_mcc
value: [0.78828985 0.79413139 0.8 0.78828985 0.80600787 0.78828985
0.79424133 0.81227078 0.78299907 0.82992191]
mean value: 0.798444188462922
key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.92105263 0.89473684 0.84210526
0.97368421 0.92105263 0.94594595 0.7027027 ]
mean value: 0.8832859174964438
key: train_accuracy
value: [0.89411765 0.89705882 0.9 0.89411765 0.90294118 0.89411765
0.89705882 0.90588235 0.8914956 0.91495601]
mean value: 0.8991745730550285
key: test_fscore
value: [0.89473684 0.9 0.84210526 0.91891892 0.88888889 0.84210526
0.97297297 0.91891892 0.94736842 0.71794872]
mean value: 0.8843964207122101
key: train_fscore
value: [0.89473684 0.8973607 0.9 0.89473684 0.90379009 0.89349112
0.89795918 0.90751445 0.89212828 0.91445428]
mean value: 0.8996171791456794
key: test_precision
value: [0.89473684 0.85714286 0.84210526 0.94444444 0.94117647 0.84210526
1. 0.94444444 0.9 0.7 ]
mean value: 0.8866155585041033
key: train_precision
value: [0.88953488 0.89473684 0.9 0.88953488 0.89595376 0.89880952
0.89017341 0.89204545 0.88953488 0.91715976]
mean value: 0.8957483402566699
key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.89473684 0.84210526 0.84210526
0.94736842 0.89473684 1. 0.73684211]
mean value: 0.8842105263157894
key: train_recall
value: [0.9 0.9 0.9 0.9 0.91176471 0.88823529
0.90588235 0.92352941 0.89473684 0.91176471]
mean value: 0.9035913312693499
key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.92105263 0.89473684 0.84210526
0.97368421 0.92105263 0.94736842 0.70175439]
mean value: 0.8833333333333333
key: train_roc_auc
value: [0.89411765 0.89705882 0.9 0.89411765 0.90294118 0.89411765
0.89705882 0.90588235 0.89148607 0.91494668]
mean value: 0.899172686618507
key: test_jcc
value: [0.80952381 0.81818182 0.72727273 0.85 0.8 0.72727273
0.94736842 0.85 0.9 0.56 ]
mean value: 0.7989619503303714
key: train_jcc
value: [0.80952381 0.81382979 0.81818182 0.80952381 0.82446809 0.80748663
0.81481481 0.83068783 0.80526316 0.8423913 ]
mean value: 0.8176171048331113
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.88404822 2.01980352 1.86825657 1.99386954 1.89242554 2.02484608
1.87313199 1.85225391 2.03934073 1.26169872]
mean value: 1.870967483520508
key: score_time
value: [0.05082941 0.02140141 0.01251268 0.01486778 0.02660775 0.01820397
0.01507092 0.01291871 0.01262975 0.01267314]
mean value: 0.019771552085876463
key: test_mcc
value: [0.78947368 0.69989647 0.73786479 0.9486833 0.9486833 0.78947368
0.89973541 0.79388419 0.89736456 0.56725146]
mean value: 0.8072310845862681
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89473684 0.84210526 0.86842105 0.97368421 0.97368421 0.89473684
0.94736842 0.89473684 0.94594595 0.78378378]
mean value: 0.9019203413940255
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.89473684 0.85714286 0.87179487 0.97435897 0.97435897 0.89473684
0.94444444 0.88888889 0.94736842 0.78947368]
mean value: 0.9037304800462695
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89473684 0.7826087 0.85 0.95 0.95 0.89473684
1. 0.94117647 0.9 0.78947368]
mean value: 0.8952732534661462
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89473684 0.94736842 0.89473684 1. 1. 0.89473684
0.89473684 0.84210526 1. 0.78947368]
mean value: 0.9157894736842105
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89473684 0.84210526 0.86842105 0.97368421 0.97368421 0.89473684
0.94736842 0.89473684 0.94736842 0.78362573]
mean value: 0.902046783625731
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.80952381 0.75 0.77272727 0.95 0.95 0.80952381
0.89473684 0.8 0.9 0.65217391]
mean value: 0.8288685646923634
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02012491 0.01943636 0.01589656 0.01583838 0.01620412 0.01516843
0.01706886 0.01599669 0.01592517 0.01600361]
mean value: 0.01676630973815918
key: score_time
value: [0.01256251 0.01115823 0.00905514 0.00882006 0.00896192 0.00889635
0.00920773 0.00900817 0.0087254 0.00878215]
mean value: 0.009517765045166016
key: test_mcc
value: [0.89973541 0.89973541 0.9486833 0.84327404 0.89473684 0.89473684
1. 0.84327404 0.94736842 0.83871328]
mean value: 0.9010257594256045
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 0.97368421 0.92105263 0.94736842 0.94736842
1. 0.92105263 0.97297297 0.91891892]
mean value: 0.9497155049786629
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94444444 0.95 0.97435897 0.92307692 0.94736842 0.94736842
1. 0.91891892 0.97297297 0.92307692]
mean value: 0.950158599895442
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9047619 0.95 0.9 0.94736842 0.94736842
1. 0.94444444 0.94736842 0.9 ]
mean value: 0.9441311612364244
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89473684 1. 1. 0.94736842 0.94736842 0.94736842
1. 0.89473684 1. 0.94736842]
mean value: 0.9578947368421052
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.94736842 0.97368421 0.92105263 0.94736842 0.94736842
1. 0.92105263 0.97368421 0.91812865]
mean value: 0.9497076023391813
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89473684 0.9047619 0.95 0.85714286 0.9 0.9
1. 0.85 0.94736842 0.85714286]
mean value: 0.9061152882205513
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10629559 0.10595846 0.10605121 0.10595107 0.10800147 0.10813618
0.11086369 0.11145663 0.11341166 0.11815166]
mean value: 0.10942776203155517
key: score_time
value: [0.01755023 0.01762581 0.01783776 0.01773572 0.01776767 0.01922417
0.01921582 0.01775551 0.01930189 0.0180676 ]
mean value: 0.01820821762084961
key: test_mcc
value: [0.78947368 0.63245553 0.73786479 0.89973541 0.9486833 0.73786479
0.89973541 0.74620251 0.84959079 0.62807634]
mean value: 0.7869682546638045
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89473684 0.81578947 0.86842105 0.94736842 0.97368421 0.86842105
0.94736842 0.86842105 0.91891892 0.81081081]
mean value: 0.8913940256045519
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.89473684 0.81081081 0.87179487 0.95 0.97435897 0.87179487
0.94444444 0.85714286 0.92307692 0.82926829]
mean value: 0.8927428888211943
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89473684 0.83333333 0.85 0.9047619 0.95 0.85
1. 0.9375 0.85714286 0.77272727]
mean value: 0.8850202210070631
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89473684 0.78947368 0.89473684 1. 1. 0.89473684
0.89473684 0.78947368 1. 0.89473684]
mean value: 0.9052631578947369
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89473684 0.81578947 0.86842105 0.94736842 0.97368421 0.86842105
0.94736842 0.86842105 0.92105263 0.80847953]
mean value: 0.8913742690058479
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.80952381 0.68181818 0.77272727 0.9047619 0.95 0.77272727
0.89473684 0.75 0.85714286 0.70833333]
mean value: 0.8101771474139895
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00998735 0.01076865 0.01041269 0.01028037 0.00994205 0.01035309
0.01037288 0.01094675 0.01107693 0.01506305]
mean value: 0.010920381546020508
key: score_time
value: [0.00932503 0.00965333 0.00984311 0.00884271 0.00942039 0.00911498
0.01005459 0.00995064 0.00970674 0.011832 ]
mean value: 0.009774351119995117
key: test_mcc
value: [0.47633051 0.52704628 0.73786479 0.68421053 0.42163702 0.36842105
0.52704628 0.68421053 0.40469382 0.46019501]
mean value: 0.529165581278633
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.76315789 0.86842105 0.84210526 0.71052632 0.68421053
0.76315789 0.84210526 0.7027027 0.72972973]
mean value: 0.7642958748221906
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.75675676 0.87179487 0.84210526 0.7027027 0.68421053
0.76923077 0.84210526 0.68571429 0.75 ]
mean value: 0.7654620438830966
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.77777778 0.85 0.84210526 0.72222222 0.68421053
0.75 0.84210526 0.70588235 0.71428571]
mean value: 0.7602874834144184
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.78947368 0.73684211 0.89473684 0.84210526 0.68421053 0.68421053
0.78947368 0.84210526 0.66666667 0.78947368]
mean value: 0.7719298245614035
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73684211 0.76315789 0.86842105 0.84210526 0.71052632 0.68421053
0.76315789 0.84210526 0.70175439 0.72807018]
mean value: 0.7640350877192983
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.60869565 0.77272727 0.72727273 0.54166667 0.52
0.625 0.72727273 0.52173913 0.6 ]
mean value: 0.624437417654809
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.52
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.60144925 1.58419991 1.56108975 1.55858779 1.55152655 1.62324905
1.61374426 1.59573984 1.60749698 1.50386572]
mean value: 1.5800949096679688
key: score_time
value: [0.09293389 0.09812546 0.09960723 0.09873724 0.09930444 0.09915233
0.10042262 0.09998393 0.10088515 0.09459758]
mean value: 0.09837498664855956
key: test_mcc
value: [0.89473684 0.79388419 0.9486833 0.9486833 0.89973541 0.89473684
1. 0.84327404 0.94736842 0.78362573]
mean value: 0.8954728071949787
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.94736842
1. 0.92105263 0.97297297 0.89189189]
mean value: 0.9470128022759602
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.88888889 0.97435897 0.97435897 0.95 0.94736842
1. 0.91891892 0.97297297 0.89473684]
mean value: 0.9468972413709256
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.94117647 0.95 0.95 0.9047619 0.94736842
1. 0.94444444 0.94736842 0.89473684]
mean value: 0.9427224925057742
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 0.84210526 1. 1. 1. 0.94736842
1. 0.89473684 1. 0.89473684]
mean value: 0.9526315789473684
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.94736842
1. 0.92105263 0.97368421 0.89181287]
mean value: 0.9470760233918128
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.8 0.95 0.95 0.9047619 0.9
1. 0.85 0.94736842 0.80952381]
mean value: 0.9011654135338346
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.94246864 0.95814466 0.88224459 0.89182115 0.94442129 1.0197804
0.91957068 0.97436619 0.95079732 0.92337275]
mean value: 0.940698766708374
key: score_time
value: [0.19855642 0.15321207 0.25342155 0.13341498 0.17170119 0.26337838
0.17836618 0.23186874 0.27052569 0.24460649]
mean value: 0.20990517139434814
key: test_mcc
value: [0.89473684 0.79388419 0.9486833 0.9486833 0.89473684 0.84327404
0.9486833 0.84327404 1. 0.73020842]
mean value: 0.8846164268265851
key: train_mcc
value: [0.96470588 0.95884012 0.96477265 0.95884012 0.95300713 0.96470588
0.95884012 0.95294118 0.95314596 0.97069143]
mean value: 0.9600490476456546
key: test_accuracy
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.92105263
0.97368421 0.92105263 1. 0.86486486]
mean value: 0.9417496443812233
key: train_accuracy
value: [0.98235294 0.97941176 0.98235294 0.97941176 0.97647059 0.98235294
0.97941176 0.97647059 0.97653959 0.98533724]
mean value: 0.9800112126962222
key: test_fscore
value: [0.94736842 0.88888889 0.97435897 0.97435897 0.94736842 0.92307692
0.97297297 0.91891892 1. 0.87179487]
mean value: 0.9419107366475787
key: train_fscore
value: [0.98235294 0.97935103 0.98224852 0.97935103 0.97633136 0.98235294
0.97935103 0.97647059 0.97647059 0.98533724]
mean value: 0.9799617281227226
key: test_precision
value: [0.94736842 0.94117647 0.95 0.95 0.94736842 0.9
1. 0.94444444 1. 0.85 ]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
0.9430357757137943
key: train_precision
value: [0.98235294 0.98224852 0.98809524 0.98224852 0.98214286 0.98235294
0.98224852 0.97647059 0.98224852 0.98245614]
mean value: 0.9822864789017445
key: test_recall
value: [0.94736842 0.84210526 1. 1. 0.94736842 0.94736842
0.94736842 0.89473684 1. 0.89473684]
mean value: 0.9421052631578947
key: train_recall
value: [0.98235294 0.97647059 0.97647059 0.97647059 0.97058824 0.98235294
0.97647059 0.97647059 0.97076023 0.98823529]
mean value: 0.9776642586859305
key: test_roc_auc
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.92105263
0.97368421 0.92105263 1. 0.86403509]
mean value: 0.9416666666666667
key: train_roc_auc
value: [0.98235294 0.97941176 0.98235294 0.97941176 0.97647059 0.98235294
0.97941176 0.97647059 0.97655659 0.98534572]
mean value: 0.9800137598899209
key: test_jcc
value: [0.9 0.8 0.95 0.95 0.9 0.85714286
0.94736842 0.85 1. 0.77272727]
mean value: 0.8927238550922761
key: train_jcc
value: [0.96531792 0.95953757 0.96511628 0.95953757 0.95375723 0.96531792
0.95953757 0.95402299 0.95402299 0.97109827]
mean value: 0.9607266302324036
MCC on Blind test: 0.85
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01481676 0.01259422 0.01158786 0.01182032 0.0115788 0.01470709
0.01181889 0.01124763 0.01149035 0.01684523]
mean value: 0.012850713729858399
key: score_time
value: [0.01577783 0.01042986 0.01050568 0.01003408 0.01580787 0.01024508
0.01042175 0.01017332 0.01543117 0.01240611]
mean value: 0.01212327480316162
key: test_mcc
value: [0.52704628 0.47633051 0.63245553 0.63245553 0.73786479 0.52704628
0.9486833 0.58218174 0.89736456 0.35558302]
mean value: 0.6317011536883512
key: train_mcc
value: [0.74138173 0.74717517 0.73632672 0.72354193 0.71236887 0.72354193
0.71944168 0.72961376 0.72462581 0.75953765]
mean value: 0.7317555250285934
key: test_accuracy
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
0.97368421 0.78947368 0.94594595 0.67567568]
mean value: 0.8147937411095306
key: train_accuracy
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
0.85882353 0.86470588 0.86217009 0.8797654 ]
mean value: 0.865664136622391
key: test_fscore
value: [0.75675676 0.75 0.81081081 0.82051282 0.86486486 0.75675676
0.97297297 0.8 0.94736842 0.71428571]
mean value: 0.8194329118013328
key: train_fscore
value: [0.87209302 0.87463557 0.87106017 0.86217009 0.85878963 0.86217009
0.86363636 0.86627907 0.86455331 0.87905605]
mean value: 0.8674443359724497
key: test_precision
value: [0.77777778 0.71428571 0.83333333 0.8 0.88888889 0.77777778
1. 0.76190476 0.9 0.65217391]
mean value: 0.8106142167011732
key: train_precision
value: [0.86206897 0.86705202 0.84916201 0.85964912 0.84180791 0.85964912
0.83516484 0.85632184 0.85227273 0.8816568 ]
mean value: 0.8564805361282117
key: test_recall
value: [0.73684211 0.78947368 0.78947368 0.84210526 0.84210526 0.73684211
0.94736842 0.84210526 1. 0.78947368]
mean value: 0.831578947368421
key: train_recall
value: [0.88235294 0.88235294 0.89411765 0.86470588 0.87647059 0.86470588
0.89411765 0.87647059 0.87719298 0.87647059]
mean value: 0.8788957688338493
key: test_roc_auc
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
0.97368421 0.78947368 0.94736842 0.67251462]
mean value: 0.8146198830409357
key: train_roc_auc
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
0.85882353 0.86470588 0.8621259 0.87975576]
mean value: 0.8656587547299621
key: test_jcc
value: [0.60869565 0.6 0.68181818 0.69565217 0.76190476 0.60869565
0.94736842 0.66666667 0.9 0.55555556]
mean value: 0.7026357065258667
key: train_jcc
value: [0.77319588 0.77720207 0.7715736 0.75773196 0.75252525 0.75773196
0.76 0.76410256 0.76142132 0.78421053]
mean value: 0.7659695133154767
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.50253129 0.13468575 0.05374956 0.13997626 0.35016894 0.12072372
0.05626488 0.16596508 0.05752921 0.05376387]
mean value: 0.16353585720062255
key: score_time
value: [0.01153541 0.01105118 0.01070404 0.01191998 0.01194215 0.01204967
0.01075387 0.01131034 0.01075554 0.01144719]
mean value: 0.011346936225891113
key: test_mcc
value: [0.9486833 0.89973541 0.89473684 0.89973541 1. 0.9486833
0.9486833 0.9486833 1. 0.78362573]
mean value: 0.9272566586986345
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.97368421
0.97368421 0.97368421 1. 0.89189189]
mean value: 0.962873399715505
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.95 0.94736842 0.95 1. 0.97297297
0.97297297 0.97435897 1. 0.89473684]
mean value: 0.9635383156435788
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9047619 0.94736842 0.9047619 1. 1.
1. 0.95 1. 0.89473684]
mean value: 0.9601629072681704
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 1. 0.94736842 1. 1. 0.94736842
0.94736842 1. 1. 0.89473684]
mean value: 0.968421052631579
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.97368421
0.97368421 0.97368421 1. 0.89181287]
mean value: 0.9628654970760233
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.9047619 0.9 0.9047619 1. 0.94736842
0.94736842 0.95 1. 0.80952381]
mean value: 0.9311152882205513
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05968142 0.05620432 0.04024172 0.05535245 0.06633306 0.06649137
0.05653095 0.0731082 0.0600152 0.07051778]
mean value: 0.06044764518737793
key: score_time
value: [0.02383661 0.01224399 0.01207924 0.01229954 0.02160764 0.03937793
0.02177739 0.01230597 0.02152944 0.01230001]
mean value: 0.018935775756835936
key: test_mcc
value: [0.78947368 0.89473684 0.58218174 0.79388419 0.79388419 0.63245553
0.89973541 0.73786479 0.78764146 0.62280702]
mean value: 0.753466484462356
key: train_mcc
value: [0.94720632 0.93543979 0.92941176 0.95294118 0.94143711 0.95897286
0.94707521 0.94117647 0.94723082 0.93550052]
mean value: 0.9436392042730108
key: test_accuracy
value: [0.89473684 0.94736842 0.78947368 0.89473684 0.89473684 0.81578947
0.94736842 0.86842105 0.89189189 0.81081081]
mean value: 0.8755334281650071
key: train_accuracy
value: [0.97352941 0.96764706 0.96470588 0.97647059 0.97058824 0.97941176
0.97352941 0.97058824 0.97360704 0.96774194]
mean value: 0.9717819561842332
key: test_fscore
value: [0.89473684 0.94736842 0.77777778 0.9 0.9 0.82051282
0.94444444 0.87179487 0.88235294 0.81081081]
mean value: 0.8749798929675091
key: train_fscore
value: [0.97329377 0.96735905 0.96470588 0.97647059 0.9702381 0.97922849
0.97345133 0.97058824 0.97360704 0.96774194]
mean value: 0.9716684407799097
key: test_precision
value: [0.89473684 0.94736842 0.82352941 0.85714286 0.85714286 0.8
1. 0.85 0.9375 0.83333333]
mean value: 0.8800753722541648
key: train_precision
value: [0.98203593 0.9760479 0.96470588 0.97647059 0.98192771 0.98802395
0.97633136 0.97058824 0.97647059 0.96491228]
mean value: 0.9757514431040658
key: test_recall
value: [0.89473684 0.94736842 0.73684211 0.94736842 0.94736842 0.84210526
0.89473684 0.89473684 0.83333333 0.78947368]
mean value: 0.8728070175438596
key: train_recall
value: [0.96470588 0.95882353 0.96470588 0.97647059 0.95882353 0.97058824
0.97058824 0.97058824 0.97076023 0.97058824]
mean value: 0.9676642586859305
key: test_roc_auc
value: [0.89473684 0.94736842 0.78947368 0.89473684 0.89473684 0.81578947
0.94736842 0.86842105 0.89035088 0.81140351]
mean value: 0.875438596491228
key: train_roc_auc
value: [0.97352941 0.96764706 0.96470588 0.97647059 0.97058824 0.97941176
0.97352941 0.97058824 0.97361541 0.96775026]
mean value: 0.9717836257309942
key: test_jcc
value: [0.80952381 0.9 0.63636364 0.81818182 0.81818182 0.69565217
0.89473684 0.77272727 0.78947368 0.68181818]
mean value: 0.781665923702537
key: train_jcc
value: [0.94797688 0.93678161 0.93181818 0.95402299 0.94219653 0.95930233
0.94827586 0.94285714 0.94857143 0.9375 ]
mean value: 0.9449302949002888
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.03640914 0.01111817 0.0107789 0.0103848 0.01065993 0.01029205
0.0105691 0.01045561 0.00946641 0.00948262]
mean value: 0.012961673736572265
key: score_time
value: [0.01239967 0.00975752 0.0088954 0.00995421 0.00950813 0.01114082
0.00941396 0.00899124 0.00900888 0.00877619]
mean value: 0.009784603118896484
key: test_mcc
value: [0.79388419 0.59222009 0.78947368 0.68421053 0.69989647 0.37047929
0.89473684 0.52704628 0.89736456 0.29618896]
mean value: 0.6545500886423572
key: train_mcc
value: [0.72354193 0.70607783 0.73540864 0.64777648 0.62994603 0.72414357
0.67676337 0.67100629 0.66600142 0.70748143]
mean value: 0.6888146983674961
key: test_accuracy
value: [0.89473684 0.78947368 0.89473684 0.84210526 0.84210526 0.68421053
0.94736842 0.76315789 0.94594595 0.64864865]
mean value: 0.82524893314367
key: train_accuracy
value: [0.86176471 0.85294118 0.86764706 0.82352941 0.81470588 0.86176471
0.83823529 0.83529412 0.83284457 0.85337243]
mean value: 0.8442099361738831
key: test_fscore
value: [0.88888889 0.76470588 0.89473684 0.84210526 0.82352941 0.66666667
0.94736842 0.76923077 0.94736842 0.66666667]
mean value: 0.821126723293906
key: train_fscore
value: [0.86135693 0.85119048 0.86646884 0.81927711 0.81081081 0.85885886
0.8358209 0.83233533 0.83086053 0.84939759]
mean value: 0.8416377378527024
key: test_precision
value: [0.94117647 0.86666667 0.89473684 0.84210526 0.93333333 0.70588235
0.94736842 0.75 0.9 0.65 ]
mean value: 0.8431269349845201
key: train_precision
value: [0.86390533 0.86144578 0.8742515 0.83950617 0.82822086 0.87730061
0.84848485 0.84756098 0.84337349 0.87037037]
mean value: 0.8554419939255328
key: test_recall
value: [0.84210526 0.68421053 0.89473684 0.84210526 0.73684211 0.63157895
0.94736842 0.78947368 1. 0.68421053]
mean value: 0.8052631578947368
key: train_recall
value: [0.85882353 0.84117647 0.85882353 0.8 0.79411765 0.84117647
0.82352941 0.81764706 0.81871345 0.82941176]
mean value: 0.8283419332645339
key: test_roc_auc
value: [0.89473684 0.78947368 0.89473684 0.84210526 0.84210526 0.68421053
0.94736842 0.76315789 0.94736842 0.64766082]
mean value: 0.8252923976608187
key: train_roc_auc
value: [0.86176471 0.85294118 0.86764706 0.82352941 0.81470588 0.86176471
0.83823529 0.83529412 0.83288614 0.85330237]
mean value: 0.8442070863433093
key: test_jcc
value: [0.8 0.61904762 0.80952381 0.72727273 0.7 0.5
0.9 0.625 0.9 0.5 ]
mean value: 0.7080844155844156
key: train_jcc
value: [0.75647668 0.74093264 0.76439791 0.69387755 0.68181818 0.75263158
0.71794872 0.71282051 0.7106599 0.7382199 ]
mean value: 0.7269783568504338
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01160216 0.01595926 0.01850629 0.01905012 0.01691532 0.01988149
0.02177978 0.01703143 0.01949835 0.01658297]
mean value: 0.017680716514587403
key: score_time
value: [0.0089817 0.01127219 0.01129818 0.01176715 0.01182032 0.01188231
0.01186323 0.01187611 0.01192379 0.01185489]
mean value: 0.011453986167907715
key: test_mcc
value: [0.61017022 0.84327404 0.68803296 0.80757285 0.80757285 0.74620251
0.9486833 0.54554473 0.89736456 0.57857577]
mean value: 0.7472993786374361
key: train_mcc
value: [0.78047467 0.86724532 0.88861973 0.87273022 0.81456113 0.81962489
0.84766884 0.72101498 0.90106396 0.85672926]
mean value: 0.8369732992272605
key: test_accuracy
value: [0.78947368 0.92105263 0.84210526 0.89473684 0.89473684 0.86842105
0.97368421 0.76315789 0.94594595 0.78378378]
mean value: 0.8677098150782361
key: train_accuracy
value: [0.88235294 0.93235294 0.94411765 0.93235294 0.9 0.90294118
0.92058824 0.84705882 0.95014663 0.92668622]
mean value: 0.9138597550457133
key: test_fscore
value: [0.81818182 0.92307692 0.83333333 0.88235294 0.88235294 0.87804878
0.97435897 0.79069767 0.94736842 0.80952381]
mean value: 0.8739295616786841
key: train_fscore
value: [0.89304813 0.92966361 0.94328358 0.92744479 0.88961039 0.91105121
0.92520776 0.86528497 0.94925373 0.92957746]
mean value: 0.9163425642953533
key: test_precision
value: [0.72 0.9 0.88235294 1. 1. 0.81818182
0.95 0.70833333 0.9 0.73913043]
mean value: 0.8617998527474231
key: train_precision
value: [0.81862745 0.96815287 0.95757576 1. 0.99275362 0.84079602
0.87434555 0.77314815 0.9695122 0.89189189]
mean value: 0.9086803502787302
key: test_recall
value: [0.94736842 0.94736842 0.78947368 0.78947368 0.78947368 0.94736842
1. 0.89473684 1. 0.89473684]
mean value: 0.9
key: train_recall
value: [0.98235294 0.89411765 0.92941176 0.86470588 0.80588235 0.99411765
0.98235294 0.98235294 0.92982456 0.97058824]
mean value: 0.9335706914344686
key: test_roc_auc
value: [0.78947368 0.92105263 0.84210526 0.89473684 0.89473684 0.86842105
0.97368421 0.76315789 0.94736842 0.78070175]
mean value: 0.8675438596491228
key: train_roc_auc
value: [0.88235294 0.93235294 0.94411765 0.93235294 0.9 0.90294118
0.92058824 0.84705882 0.9502064 0.92681459]
mean value: 0.9138785689714483
key: test_jcc
value: [0.69230769 0.85714286 0.71428571 0.78947368 0.78947368 0.7826087
0.95 0.65384615 0.9 0.68 ]
mean value: 0.7809138481655644
key: train_jcc
value: [0.80676329 0.86857143 0.89265537 0.86470588 0.80116959 0.83663366
0.86082474 0.76255708 0.90340909 0.86842105]
mean value: 0.8465711180624056
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01726389 0.01823545 0.01617908 0.01863098 0.01907182 0.01757312
0.01636481 0.01706433 0.01851106 0.01773095]
mean value: 0.017662549018859865
key: score_time
value: [0.01184845 0.01182127 0.01185322 0.01189089 0.01183534 0.01188326
0.01175451 0.01184678 0.01183844 0.01181006]
mean value: 0.011838221549987793
key: test_mcc
value: [0.74620251 0.63245553 0.68803296 0.89973541 0.85280287 0.68803296
0.76376262 0.79388419 0.7163504 0.57857577]
mean value: 0.7359835206381676
key: train_mcc
value: [0.88333157 0.90352405 0.86508013 0.90014017 0.82470774 0.8452381
0.72433672 0.83493231 0.77515848 0.87706192]
mean value: 0.8433511179735004
key: test_accuracy
value: [0.86842105 0.81578947 0.84210526 0.94736842 0.92105263 0.84210526
0.86842105 0.89473684 0.83783784 0.78378378]
mean value: 0.8621621621621621
key: train_accuracy
value: [0.94117647 0.95 0.93235294 0.95 0.90588235 0.91764706
0.84411765 0.91176471 0.87683284 0.93548387]
mean value: 0.916525789201311
key: test_fscore
value: [0.85714286 0.82051282 0.85 0.95 0.91428571 0.83333333
0.84848485 0.88888889 0.85714286 0.80952381]
mean value: 0.8629315129315129
key: train_fscore
value: [0.93975904 0.94769231 0.93333333 0.95043732 0.89677419 0.91082803
0.81533101 0.90384615 0.89005236 0.93888889]
mean value: 0.9126942623189517
key: test_precision
value: [0.9375 0.8 0.80952381 0.9047619 1. 0.88235294
1. 0.94117647 0.75 0.73913043]
mean value: 0.8764445560833029
key: train_precision
value: [0.96296296 0.99354839 0.92 0.94219653 0.99285714 0.99305556
1. 0.99295775 0.8056872 0.88947368]
mean value: 0.9492739214745212
key: test_recall
value: [0.78947368 0.84210526 0.89473684 1. 0.84210526 0.78947368
0.73684211 0.84210526 1. 0.89473684]
mean value: 0.8631578947368421
key: train_recall
value: [0.91764706 0.90588235 0.94705882 0.95882353 0.81764706 0.84117647
0.68823529 0.82941176 0.99415205 0.99411765]
mean value: 0.8894152046783625
key: test_roc_auc
value: [0.86842105 0.81578947 0.84210526 0.94736842 0.92105263 0.84210526
0.86842105 0.89473684 0.84210526 0.78070175]
mean value: 0.862280701754386
key: train_roc_auc
value: [0.94117647 0.95 0.93235294 0.95 0.90588235 0.91764706
0.84411765 0.91176471 0.87648779 0.93565531]
mean value: 0.9165084279325766
key: test_jcc
value: [0.75 0.69565217 0.73913043 0.9047619 0.84210526 0.71428571
0.73684211 0.8 0.75 0.68 ]
mean value: 0.7612777596164324
key: train_jcc
value: [0.88636364 0.9005848 0.875 0.90555556 0.8128655 0.83625731
0.68823529 0.8245614 0.80188679 0.88481675]
mean value: 0.8416127038264324
MCC on Blind test: 0.62
Accuracy on Blind test: 0.79
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.16265321 0.14861178 0.14626455 0.14584064 0.14930272 0.16142392
0.15244198 0.14463997 0.14828467 0.1534369 ]
mean value: 0.15129003524780274
key: score_time
value: [0.0151782 0.01543975 0.01532531 0.01555467 0.01592684 0.02432156
0.01536274 0.01528263 0.01656127 0.01543808]
mean value: 0.016439104080200197
key: test_mcc
value: [0.9486833 0.89973541 0.89473684 0.89973541 1. 0.9486833
0.9486833 0.9486833 1. 0.78764146]
mean value: 0.927658231800502
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.97368421
0.97368421 0.97368421 1. 0.89189189]
mean value: 0.962873399715505
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.95 0.94736842 0.95 1. 0.97297297
0.97297297 0.97435897 1. 0.9 ]
mean value: 0.9640646314330525
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9047619 0.94736842 0.9047619 1. 1.
1. 0.95 1. 0.85714286]
mean value: 0.9564035087719298
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 1. 0.94736842 1. 1. 0.94736842
0.94736842 1. 1. 0.94736842]
mean value: 0.9736842105263157
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.97368421
0.97368421 0.97368421 1. 0.89035088]
mean value: 0.962719298245614
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.9047619 0.9 0.9047619 1. 0.94736842
0.94736842 0.95 1. 0.81818182]
mean value: 0.9319810890863522
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04295039 0.05016208 0.03978586 0.04688597 0.05769682 0.05953455
0.07448912 0.0562222 0.04632425 0.03965807]
mean value: 0.05137093067169189
key: score_time
value: [0.02003407 0.02414131 0.01633906 0.02169681 0.02622557 0.02317142
0.02110815 0.03839302 0.02159262 0.02609897]
mean value: 0.02388010025024414
key: test_mcc
value: [0.9486833 0.78947368 0.9486833 0.89973541 1. 0.89473684
0.9486833 0.9486833 1. 0.83871328]
mean value: 0.9217392413194645
key: train_mcc
value: [0.98823529 0.98823529 0.99413485 1. 0.98823529 0.98236994
0.97653817 0.99413485 0.99415185 0.98826969]
mean value: 0.9894305224350558
key: test_accuracy
value: [0.97368421 0.89473684 0.97368421 0.94736842 1. 0.94736842
0.97368421 0.97368421 1. 0.91891892]
mean value: 0.9603129445234708
key: train_accuracy
value: [0.99411765 0.99411765 0.99705882 1. 0.99411765 0.99117647
0.98823529 0.99705882 0.99706745 0.9941349 ]
mean value: 0.9947084698982233
key: test_fscore
value: [0.97297297 0.89473684 0.97435897 0.95 1. 0.94736842
0.97297297 0.97435897 1. 0.92307692]
mean value: 0.9609846080898713
key: train_fscore
value: [0.99411765 0.99411765 0.99706745 1. 0.99411765 0.99120235
0.98816568 0.99706745 0.99708455 0.99411765]
mean value: 0.9947058060215382
key: test_precision
value: [1. 0.89473684 0.95 0.9047619 1. 0.94736842
1. 0.95 1. 0.9 ]
mean value: 0.9546867167919799
key: train_precision
value: [0.99411765 0.99411765 0.99415205 1. 0.99411765 0.98830409
0.99404762 0.99415205 0.99418605 0.99411765]
mean value: 0.9941312440929044
key: test_recall
value: [0.94736842 0.89473684 1. 1. 1. 0.94736842
0.94736842 1. 1. 0.94736842]
mean value: 0.968421052631579
key: train_recall
value: [0.99411765 0.99411765 1. 1. 0.99411765 0.99411765
0.98235294 1. 1. 0.99411765]
mean value: 0.9952941176470589
key: test_roc_auc
value: [0.97368421 0.89473684 0.97368421 0.94736842 1. 0.94736842
0.97368421 0.97368421 1. 0.91812865]
mean value: 0.960233918128655
key: train_roc_auc
value: [0.99411765 0.99411765 0.99705882 1. 0.99411765 0.99117647
0.98823529 0.99705882 0.99705882 0.99413485]
mean value: 0.9947076023391813
key: test_jcc
value: [0.94736842 0.80952381 0.95 0.9047619 1. 0.9
0.94736842 0.95 1. 0.85714286]
mean value: 0.9266165413533834
key: train_jcc
value: [0.98830409 0.98830409 0.99415205 1. 0.98830409 0.98255814
0.97660819 0.99415205 0.99418605 0.98830409]
mean value: 0.9894872841017271
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.14919233 0.09951925 0.13426876 0.0898757 0.09314251 0.08426976
0.08239388 0.05807185 0.10029602 0.09456182]
mean value: 0.09855918884277344
key: score_time
value: [0.02806473 0.01825833 0.02253628 0.02233458 0.0233705 0.02221894
0.01384473 0.01433444 0.01381803 0.02184868]
mean value: 0.020062923431396484
key: test_mcc
value: [0.63245553 0.16151457 0.68803296 0.63245553 0.68421053 0.57894737
0.85280287 0.59222009 0.89181287 0.51319869]
mean value: 0.6227651001498481
key: train_mcc
value: [0.99413485 0.99413485 0.99413485 0.99413485 0.99413485 1.
0.99413485 1. 0.99415205 0.99415185]
mean value: 0.9953112973615207
key: test_accuracy
value: [0.81578947 0.57894737 0.84210526 0.81578947 0.84210526 0.78947368
0.92105263 0.78947368 0.94594595 0.75675676]
mean value: 0.8097439544807966
key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 0.99705882 0.99705882 1.
0.99705882 1. 0.99706745 0.99706745]
mean value: 0.9976487838537175
key: test_fscore
value: [0.82051282 0.52941176 0.83333333 0.82051282 0.84210526 0.78947368
0.91428571 0.76470588 0.94444444 0.76923077]
mean value: 0.8028016496747147
key: train_fscore
value: [0.99705015 0.99705015 0.99705015 0.99705015 0.99705015 1.
0.99705015 1. 0.99706745 0.99705015]
mean value: 0.9976418481128729
key: test_precision
value: [0.8 0.6 0.88235294 0.8 0.84210526 0.78947368
1. 0.86666667 0.94444444 0.75 ]
mean value: 0.8275042999656003
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84210526 0.47368421 0.78947368 0.84210526 0.84210526 0.78947368
0.84210526 0.68421053 0.94444444 0.78947368]
mean value: 0.7839181286549708
key: train_recall
value: [0.99411765 0.99411765 0.99411765 0.99411765 0.99411765 1.
0.99411765 1. 0.99415205 0.99411765]
mean value: 0.9952975576195391
key: test_roc_auc
value: [0.81578947 0.57894737 0.84210526 0.81578947 0.84210526 0.78947368
0.92105263 0.78947368 0.94590643 0.75584795]
mean value: 0.8096491228070175
key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 0.99705882 0.99705882 1.
0.99705882 1. 0.99707602 0.99705882]
mean value: 0.9976487788097695
key: test_jcc
value: [0.69565217 0.36 0.71428571 0.69565217 0.72727273 0.65217391
0.84210526 0.61904762 0.89473684 0.625 ]
mean value: 0.6825926426738784
key: train_jcc
value: [0.99411765 0.99411765 0.99411765 0.99411765 0.99411765 1.
0.99411765 1. 0.99415205 0.99411765]
mean value: 0.9952975576195391
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.6186831 0.62528348 0.54585361 0.54748797 0.58207774 0.56963778
0.5545404 0.55546618 0.54905748 0.55526996]
mean value: 0.5703357696533203
key: score_time
value: [0.01032495 0.00916743 0.00931358 0.01051068 0.00964808 0.00931263
0.01022315 0.00920653 0.00935125 0.00947666]
mean value: 0.009653496742248534
key: test_mcc
value: [0.9486833 0.89973541 0.89473684 0.89973541 1. 0.89473684
1. 0.89473684 0.94736842 0.83871328]
mean value: 0.9218446350938172
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.94736842
1. 0.94736842 0.97297297 0.91891892]
mean value: 0.9602418207681366
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.95 0.94736842 0.95 1. 0.94736842
1. 0.94736842 0.97297297 0.92307692]
mean value: 0.9611128132180764
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9047619 0.94736842 0.9047619 1. 0.94736842
1. 0.94736842 0.94736842 0.9 ]
mean value: 0.9498997493734336
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94736842 1. 0.94736842 1. 1. 0.94736842
1. 0.94736842 1. 0.94736842]
mean value: 0.9736842105263157
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1. 0.94736842
1. 0.94736842 0.97368421 0.91812865]
mean value: 0.960233918128655
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.9047619 0.9 0.9047619 1. 0.9
1. 0.9 0.94736842 0.85714286]
mean value: 0.9261403508771929
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02871037 0.04236031 0.04015183 0.07962561 0.02676129 0.04086113
0.02677369 0.02712107 0.02694225 0.02759171]
mean value: 0.036689925193786624
key: score_time
value: [0.02021146 0.01885295 0.01714921 0.01255703 0.01522589 0.01539779
0.01532888 0.01534462 0.01538682 0.01636577]
mean value: 0.01618204116821289
key: test_mcc
value: [0.47633051 0.2773501 0.53300179 0.43643578 0.10660036 0.54554473
0.73786479 0.53300179 0.63129316 0.25301653]
mean value: 0.453043953428385
key: train_mcc
value: [0.98823529 0.9653073 0.99413485 0.98830369 0.78190435 0.94838881
0.976741 0.95963741 0.94298132 0.88351945]
mean value: 0.9429153475266533
key: test_accuracy
value: [0.73684211 0.63157895 0.76315789 0.71052632 0.55263158 0.76315789
0.86842105 0.76315789 0.81081081 0.62162162]
mean value: 0.7221906116642959
key: train_accuracy
value: [0.99411765 0.98235294 0.99705882 0.99411765 0.87941176 0.97352941
0.98823529 0.97941176 0.97067449 0.93841642]
mean value: 0.9697326203208556
key: test_fscore
value: [0.75 0.5625 0.74285714 0.74418605 0.51428571 0.79069767
0.87179487 0.74285714 0.82051282 0.58823529]
mean value: 0.7127926707355572
key: train_fscore
value: [0.99411765 0.98203593 0.99706745 0.99408284 0.86287625 0.97280967
0.98809524 0.97897898 0.96987952 0.93416928]
mean value: 0.9674112800117264
key: test_precision
value: [0.71428571 0.69230769 0.8125 0.66666667 0.5625 0.70833333
0.85 0.8125 0.76190476 0.66666667]
mean value: 0.7247664835164835
key: train_precision
value: [0.99411765 1. 0.99415205 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9988269693842449
key: test_recall
value: [0.78947368 0.47368421 0.68421053 0.84210526 0.47368421 0.89473684
0.89473684 0.68421053 0.88888889 0.52631579]
mean value: 0.7152046783625731
key: train_recall
value: [0.99411765 0.96470588 1. 0.98823529 0.75882353 0.94705882
0.97647059 0.95882353 0.94152047 0.87647059]
mean value: 0.9406226350189199
key: test_roc_auc
value: [0.73684211 0.63157895 0.76315789 0.71052632 0.55263158 0.76315789
0.86842105 0.76315789 0.8128655 0.62426901]
mean value: 0.7226608187134502
key: train_roc_auc
value: [0.99411765 0.98235294 0.99705882 0.99411765 0.87941176 0.97352941
0.98823529 0.97941176 0.97076023 0.93823529]
mean value: 0.9697230822153423
key: test_jcc
value: [0.6 0.39130435 0.59090909 0.59259259 0.34615385 0.65384615
0.77272727 0.59090909 0.69565217 0.41666667]
mean value: 0.5650761235543844
key: train_jcc
value: [0.98830409 0.96470588 0.99415205 0.98823529 0.75882353 0.94705882
0.97647059 0.95882353 0.94152047 0.87647059]
mean value: 0.9394564843481252
MCC on Blind test: 0.42
Accuracy on Blind test: 0.71
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02983665 0.03623152 0.0326581 0.03908563 0.04000688 0.04243374
0.03620481 0.03728127 0.04367828 0.03259063]
mean value: 0.03700075149536133
key: score_time
value: [0.02391553 0.02225924 0.02418375 0.02630496 0.02330065 0.02267957
0.0241394 0.02215934 0.02327657 0.02062058]
mean value: 0.023283958435058594
key: test_mcc
value: [0.78947368 0.84327404 0.68421053 0.89473684 0.84327404 0.68803296
0.89973541 0.84327404 0.94736842 0.51461988]
mean value: 0.7947999856934739
key: train_mcc
value: [0.87648575 0.87648575 0.88235294 0.88241401 0.87648575 0.88825066
0.88825066 0.88825066 0.87684899 0.87121527]
mean value: 0.8807040447224638
key: test_accuracy
value: [0.89473684 0.92105263 0.84210526 0.94736842 0.92105263 0.84210526
0.94736842 0.92105263 0.97297297 0.75675676]
mean value: 0.8966571834992887
key: train_accuracy
value: [0.93823529 0.93823529 0.94117647 0.94117647 0.93823529 0.94411765
0.94411765 0.94411765 0.93841642 0.93548387]
mean value: 0.9403312057961014
key: test_fscore
value: [0.89473684 0.92307692 0.84210526 0.94736842 0.91891892 0.85
0.94444444 0.91891892 0.97297297 0.75675676]
mean value: 0.8969299461404725
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93841642 0.93841642 0.94117647 0.9408284 0.93841642 0.9439528
0.9439528 0.94428152 0.93841642 0.93604651]
mean value: 0.9403904203379017
key: test_precision
value: [0.89473684 0.9 0.84210526 0.94736842 0.94444444 0.80952381
1. 0.94444444 0.94736842 0.77777778]
mean value: 0.9007769423558897
key: train_precision
value: [0.93567251 0.93567251 0.94117647 0.94642857 0.93567251 0.94674556
0.94674556 0.94152047 0.94117647 0.92528736]
mean value: 0.9396098004883142
key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.94736842 0.89473684 0.89473684
0.89473684 0.89473684 1. 0.73684211]
mean value: 0.8947368421052632
key: train_recall
value: [0.94117647 0.94117647 0.94117647 0.93529412 0.94117647 0.94117647
0.94117647 0.94705882 0.93567251 0.94705882]
mean value: 0.9412143102855177
key: test_roc_auc
value: [0.89473684 0.92105263 0.84210526 0.94736842 0.92105263 0.84210526
0.94736842 0.92105263 0.97368421 0.75730994]
mean value: 0.8967836257309941
key: train_roc_auc
value: [0.93823529 0.93823529 0.94117647 0.94117647 0.93823529 0.94411765
0.94411765 0.94411765 0.93842449 0.93551772]
mean value: 0.9403353973168215
key: test_jcc
value: [0.80952381 0.85714286 0.72727273 0.9 0.85 0.73913043
0.89473684 0.85 0.94736842 0.60869565]
mean value: 0.818387074405381
key: train_jcc
value: [0.8839779 0.8839779 0.88888889 0.88826816 0.8839779 0.89385475
0.89385475 0.89444444 0.8839779 0.87978142]
mean value: 0.887500400993959
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25618434 0.2671926 0.32188869 0.26167226 0.25118732 0.25854945
0.26166058 0.24911594 0.2661407 0.28530979]
mean value: 0.26789016723632814
key: score_time
value: [0.02451444 0.02255034 0.02055788 0.02151513 0.02196789 0.01751781
0.02237058 0.0236876 0.02393007 0.02099395]
mean value: 0.02196056842803955
key: test_mcc
value: [0.78947368 0.78947368 0.68421053 0.89473684 0.84327404 0.68803296
0.89973541 0.84327404 0.94736842 0.51461988]
mean value: 0.7894199498433697
key: train_mcc
value: [0.87648575 0.78828985 0.88235294 0.88241401 0.87648575 0.88825066
0.88825066 0.88825066 0.87684899 0.87121527]
mean value: 0.8718844543680944
key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.92105263 0.84210526
0.94736842 0.92105263 0.97297297 0.75675676]
mean value: 0.8940256045519204
key: train_accuracy
value: [0.93823529 0.89411765 0.94117647 0.94117647 0.93823529 0.94411765
0.94411765 0.94411765 0.93841642 0.93548387]
mean value: 0.9359194410902191
key: test_fscore
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.91891892 0.85
0.94444444 0.91891892 0.97297297 0.75675676]
mean value: 0.8940959380433064
key: train_fscore
value: [0.93841642 0.89349112 0.94117647 0.9408284 0.93841642 0.9439528
0.9439528 0.94428152 0.93841642 0.93604651]
mean value: 0.9358978905351981
key: test_precision
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.94444444 0.80952381
1. 0.94444444 0.94736842 0.77777778]
mean value: 0.9002506265664161
key: train_precision
value: [0.93567251 0.89880952 0.94117647 0.94642857 0.93567251 0.94674556
0.94674556 0.94152047 0.94117647 0.92528736]
mean value: 0.9359235014072783
key: test_recall
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.89473684 0.89473684
0.89473684 0.89473684 1. 0.73684211]
mean value: 0.8894736842105263
key: train_recall
value: [0.94117647 0.88823529 0.94117647 0.93529412 0.94117647 0.94117647
0.94117647 0.94705882 0.93567251 0.94705882]
mean value: 0.9359201926384588
key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.92105263 0.84210526
0.94736842 0.92105263 0.97368421 0.75730994]
mean value: 0.8941520467836257
key: train_roc_auc
value: [0.93823529 0.89411765 0.94117647 0.94117647 0.93823529 0.94411765
0.94411765 0.94411765 0.93842449 0.93551772]
mean value: 0.9359236326109391
key: test_jcc
value: [0.80952381 0.80952381 0.72727273 0.9 0.85 0.73913043
0.89473684 0.85 0.94736842 0.60869565]
mean value: 0.8136251696434763
key: train_jcc
value: [0.8839779 0.80748663 0.88888889 0.88826816 0.8839779 0.89385475
0.89385475 0.89444444 0.8839779 0.87978142]
mean value: 0.8798512740403147
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9