LSHTM_analysis/scripts/ml/log_rpob_orig.txt
2022-06-20 21:55:47 +01:00

19765 lines
981 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_orig.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 282, 1: 275}) Data dim: (557, 176)
-------------------------------------------------------------
Successfully split data: ORIGINAL training
actual values: training set
imputed values: blind test set
Train data size: (557, 176)
Test data size: (575, 176)
y_train numbers: Counter({0: 282, 1: 275})
y_train ratio: 1.0254545454545454
y_test_numbers: Counter({0: 545, 1: 30})
y_test ratio: 18.166666666666668
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 282, 1: 282})
(564, 176)
Simple Random UnderSampling
Counter({0: 275, 1: 275})
(550, 176)
Simple Combined Over and UnderSampling
Counter({0: 282, 1: 282})
(564, 176)
SMOTE_NC OverSampling
Counter({0: 282, 1: 282})
(564, 176)
#####################################################################
Running ML analysis: ORIGINAL
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_orig/
Sanity checks:
Total input features: 176
Training data size: (557, 176)
Test data size: (575, 176)
Target feature numbers (training data): Counter({0: 282, 1: 275})
Target features ratio (training data: 1.0254545454545454
Target feature numbers (test data): Counter({0: 545, 1: 30})
Target features ratio (test data): 18.166666666666668
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03571892 0.03646374 0.03661752 0.03673768 0.03027439 0.03322029
0.03499603 0.03590631 0.06422377 0.04940248]
mean value: 0.039356112480163574
key: score_time
value: [0.01237988 0.01195502 0.01191425 0.01493478 0.01193166 0.01194549
0.01192307 0.01487398 0.01554179 0.01552892]
mean value: 0.013292884826660157
key: test_mcc
value: [0.96490128 0.78544061 0.89342711 0.71428571 0.67900461 0.89802651
0.71611487 0.78174603 0.82337971 0.8565805 ]
mean value: 0.8112906954356885
key: train_mcc
value: [0.85235948 0.86826252 0.8522816 0.85627063 0.87227261 0.85317946
0.85640062 0.86858794 0.86061924 0.85683896]
mean value: 0.8597073065379107
key: test_accuracy
value: [0.98214286 0.89285714 0.94642857 0.85714286 0.83928571 0.94642857
0.85714286 0.89090909 0.90909091 0.92727273]
mean value: 0.9048701298701298
key: train_accuracy
value: [0.9261477 0.93413174 0.9261477 0.92814371 0.93612774 0.9261477
0.92814371 0.93426295 0.93027888 0.92828685]
mean value: 0.9297818705219044
key: test_fscore
value: [0.98181818 0.88888889 0.94736842 0.85714286 0.84210526 0.94339623
0.86206897 0.88888889 0.9122807 0.92307692]
mean value: 0.9047035317712988
key: train_fscore
value: [0.9258517 0.93360161 0.92525253 0.92682927 0.93548387 0.92673267
0.92771084 0.93386774 0.92985972 0.92828685]
mean value: 0.9293476801717994
key: test_precision
value: [0.96428571 0.88888889 0.93103448 0.85714286 0.82758621 1.
0.83333333 0.88888889 0.86666667 0.96 ]
mean value: 0.9017827038861521
key: train_precision
value: [0.92031873 0.93172691 0.9233871 0.93061224 0.93172691 0.90697674
0.92031873 0.92828685 0.92430279 0.91732283]
mean value: 0.9234979827398379
key: test_recall
value: [1. 0.88888889 0.96428571 0.85714286 0.85714286 0.89285714
0.89285714 0.88888889 0.96296296 0.88888889]
mean value: 0.9093915343915344
key: train_recall
value: [0.93145161 0.93548387 0.92712551 0.92307692 0.93927126 0.94736842
0.93522267 0.93951613 0.93548387 0.93951613]
mean value: 0.9353516390231161
key: test_roc_auc
value: [0.98275862 0.89272031 0.94642857 0.85714286 0.83928571 0.94642857
0.85714286 0.89087302 0.91005291 0.9265873 ]
mean value: 0.9049420726144864
key: train_roc_auc
value: [0.92620011 0.9341451 0.92616118 0.92807389 0.93617106 0.92644012
0.92824126 0.93432499 0.93034036 0.92841948]
mean value: 0.9298517555235091
key: test_jcc
value: [0.96428571 0.8 0.9 0.75 0.72727273 0.89285714
0.75757576 0.8 0.83870968 0.85714286]
mean value: 0.8287843876553554
key: train_jcc
value: [0.8619403 0.8754717 0.86090226 0.86363636 0.87878788 0.86346863
0.86516854 0.87593985 0.86891386 0.866171 ]
mean value: 0.8680400379715635
MCC on Blind test: 0.28
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.90763521 0.90286136 1.07441068 0.9007256 1.06581187 0.93487668
1.07855535 0.91051745 0.9207058 0.9194684 ]
mean value: 0.9615568399429322
key: score_time
value: [0.01460195 0.01202965 0.01527715 0.01221609 0.01532865 0.01546383
0.01225019 0.0153358 0.01221108 0.01221395]
mean value: 0.013692831993103028
key: test_mcc
value: [0.96490128 0.74984143 0.89342711 0.75047877 0.75047877 0.89802651
0.75047877 0.74569602 0.75033796 0.8565805 ]
mean value: 0.8110247141410594
key: train_mcc
value: [0.87624709 0.83235852 0.90022801 0.81633141 0.95211147 0.89227656
0.88021614 0.90436881 0.82867552 0.83278304]
mean value: 0.8715596565224771
key: test_accuracy
value: [0.98214286 0.875 0.94642857 0.875 0.875 0.94642857
0.875 0.87272727 0.87272727 0.92727273]
mean value: 0.9047727272727273
key: train_accuracy
value: [0.93812375 0.91616766 0.9500998 0.90818363 0.9760479 0.94610778
0.94011976 0.95219124 0.91434263 0.91633466]
mean value: 0.9357718825297612
key: test_fscore
value: [0.98181818 0.86792453 0.94736842 0.87719298 0.87272727 0.94339623
0.87719298 0.86792453 0.87719298 0.92307692]
mean value: 0.9035815029062298
key: train_fscore
value: [0.93762575 0.91566265 0.9490835 0.90688259 0.97560976 0.94567404
0.93927126 0.9516129 0.91348089 0.916 ]
mean value: 0.9350903343239241
key: test_precision
value: [0.96428571 0.88461538 0.93103448 0.86206897 0.88888889 1.
0.86206897 0.88461538 0.83333333 0.96 ]
mean value: 0.9070911119531809
key: train_precision
value: [0.93574297 0.912 0.95491803 0.90688259 0.97959184 0.94
0.93927126 0.9516129 0.91164659 0.90873016]
mean value: 0.9340396335864323
key: test_recall
value: [1. 0.85185185 0.96428571 0.89285714 0.85714286 0.89285714
0.89285714 0.85185185 0.92592593 0.88888889]
mean value: 0.9018518518518519
key: train_recall
value: [0.93951613 0.91935484 0.94331984 0.90688259 0.97165992 0.951417
0.93927126 0.9516129 0.91532258 0.9233871 ]
mean value: 0.9361744155674546
key: test_roc_auc
value: [0.98275862 0.87420179 0.94642857 0.875 0.875 0.94642857
0.875 0.8723545 0.87367725 0.9265873 ]
mean value: 0.9047436599160738
key: train_roc_auc
value: [0.93813751 0.91619916 0.95000638 0.9081657 0.97598744 0.94618094
0.94010807 0.9521844 0.9143542 0.91641796]
mean value: 0.9357741767545031
key: test_jcc
value: [0.96428571 0.76666667 0.9 0.78125 0.77419355 0.89285714
0.78125 0.76666667 0.78125 0.85714286]
mean value: 0.8265562596006144
key: train_jcc
value: [0.88257576 0.84444444 0.90310078 0.82962963 0.95238095 0.89694656
0.88549618 0.90769231 0.84074074 0.84501845]
mean value: 0.8788025805933736
MCC on Blind test: 0.28
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0148561 0.01195788 0.01015806 0.01290393 0.01003218 0.01007318
0.01003528 0.01038003 0.01010847 0.01045537]
mean value: 0.011096048355102538
key: score_time
value: [0.01212811 0.01027489 0.00905752 0.00897837 0.00887227 0.00879979
0.00898051 0.00892806 0.00926328 0.00886559]
mean value: 0.00941483974456787
key: test_mcc
value: [0.79257331 0.44074684 0.55328334 0.64951905 0.3992747 0.60753044
0.73127242 0.52715278 0.78174603 0.78353876]
mean value: 0.6266637668473235
key: train_mcc
value: [0.65574845 0.66306345 0.63858187 0.68063475 0.65119268 0.6812805
0.66951249 0.68417806 0.6328508 0.68226627]
mean value: 0.6639309307086467
key: test_accuracy
value: [0.89285714 0.71428571 0.76785714 0.82142857 0.69642857 0.80357143
0.85714286 0.76363636 0.89090909 0.89090909]
mean value: 0.8099025974025974
key: train_accuracy
value: [0.8243513 0.82834331 0.81636727 0.83632735 0.81437126 0.83832335
0.83233533 0.83864542 0.812749 0.83665339]
mean value: 0.8278466970441587
key: test_fscore
value: [0.88 0.65217391 0.73469388 0.80769231 0.66666667 0.8
0.84 0.75471698 0.88888889 0.88461538]
mean value: 0.7909448019589822
key: train_fscore
value: [0.80786026 0.81304348 0.79912664 0.81938326 0.78220141 0.825054
0.81818182 0.82352941 0.79385965 0.81938326]
mean value: 0.8101623177549878
key: test_precision
value: [0.95652174 0.78947368 0.85714286 0.875 0.73913043 0.81481481
0.95454545 0.76923077 0.88888889 0.92 ]
mean value: 0.8564748642746355
key: train_precision
value: [0.88095238 0.88207547 0.86729858 0.89855072 0.92777778 0.88425926
0.87906977 0.8957346 0.87019231 0.90291262]
mean value: 0.8888823486174054
key: test_recall
value: [0.81481481 0.55555556 0.64285714 0.75 0.60714286 0.78571429
0.75 0.74074074 0.88888889 0.85185185]
mean value: 0.7387566137566137
key: train_recall
value: [0.74596774 0.75403226 0.74089069 0.75303644 0.67611336 0.77327935
0.76518219 0.76209677 0.72983871 0.75 ]
mean value: 0.7450437508162466
key: test_roc_auc
value: [0.89016603 0.70881226 0.76785714 0.82142857 0.69642857 0.80357143
0.85714286 0.76322751 0.89087302 0.89021164]
mean value: 0.8089719029374202
key: train_roc_auc
value: [0.82357676 0.82760901 0.81532723 0.83517964 0.81246613 0.83742708
0.83140999 0.8377413 0.81176975 0.83562992]
mean value: 0.8268136808296788
key: test_jcc
value: [0.78571429 0.48387097 0.58064516 0.67741935 0.5 0.66666667
0.72413793 0.60606061 0.8 0.79310345]
mean value: 0.6617618421622871
key: train_jcc
value: [0.67765568 0.68498168 0.66545455 0.69402985 0.64230769 0.70220588
0.69230769 0.7 0.65818182 0.69402985]
mean value: 0.6811154694734589
MCC on Blind test: 0.33
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0104661 0.01030374 0.01144981 0.01024771 0.0105319 0.01100707
0.01029181 0.01024437 0.01042271 0.01044583]
mean value: 0.010541105270385742
key: score_time
value: [0.00899553 0.00887704 0.00901628 0.00890112 0.00991344 0.0087688
0.00883007 0.00883222 0.00892782 0.00884151]
mean value: 0.00899038314819336
key: test_mcc
value: [0.9284802 0.6431407 0.60753044 0.67900461 0.67900461 0.64450339
0.57142857 0.63745526 0.77174363 0.74603175]
mean value: 0.690832314648529
key: train_mcc
value: [0.74057724 0.74448553 0.69260708 0.7804155 0.740478 0.69270547
0.7484655 0.72133625 0.76096895 0.74917652]
mean value: 0.7371216028225991
key: test_accuracy
value: [0.96428571 0.82142857 0.80357143 0.83928571 0.83928571 0.82142857
0.78571429 0.81818182 0.87272727 0.87272727]
mean value: 0.8438636363636364
key: train_accuracy
value: [0.87025948 0.87225549 0.84630739 0.89021956 0.87025948 0.84630739
0.8742515 0.86055777 0.88047809 0.87450199]
mean value: 0.8685398128046695
key: test_fscore
value: [0.96296296 0.80769231 0.8 0.83636364 0.84210526 0.81481481
0.78571429 0.80769231 0.8852459 0.87272727]
mean value: 0.8415318752764827
key: train_fscore
value: [0.86973948 0.87096774 0.84253579 0.88798371 0.86761711 0.84188912
0.87169043 0.85655738 0.87951807 0.8742515 ]
mean value: 0.8662750313964435
key: test_precision
value: [0.96296296 0.84 0.81481481 0.85185185 0.82758621 0.84615385
0.78571429 0.84 0.79411765 0.85714286]
mean value: 0.8420344472595993
key: train_precision
value: [0.86454183 0.87096774 0.85123967 0.89344262 0.87295082 0.85416667
0.87704918 0.87083333 0.876 0.86561265]
mean value: 0.8696804515198457
key: test_recall
value: [0.96296296 0.77777778 0.78571429 0.82142857 0.85714286 0.78571429
0.78571429 0.77777778 1. 0.88888889]
mean value: 0.8443121693121693
key: train_recall
value: [0.875 0.87096774 0.8340081 0.88259109 0.86234818 0.82995951
0.86639676 0.84274194 0.88306452 0.88306452]
mean value: 0.8630142353402116
key: test_roc_auc
value: [0.9642401 0.81992337 0.80357143 0.83928571 0.83928571 0.82142857
0.78571429 0.81746032 0.875 0.87301587]
mean value: 0.8438925378580551
key: train_roc_auc
value: [0.87030632 0.87224276 0.84613791 0.89011444 0.87015047 0.84608212
0.87414326 0.86034735 0.88050864 0.87460312]
mean value: 0.8684636394092362
key: test_jcc
value: [0.92857143 0.67741935 0.66666667 0.71875 0.72727273 0.6875
0.64705882 0.67741935 0.79411765 0.77419355]
mean value: 0.7298969551163574
key: train_jcc
value: [0.76950355 0.77142857 0.72791519 0.7985348 0.76618705 0.72695035
0.77256318 0.74910394 0.78494624 0.77659574]
mean value: 0.7643728616166219
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01013756 0.01020646 0.01003814 0.00957966 0.01008773 0.01100087
0.01095223 0.01078153 0.01096463 0.01088476]
mean value: 0.010463356971740723
key: score_time
value: [0.08086848 0.01246262 0.01641536 0.01239705 0.01289415 0.01338482
0.01392913 0.01344013 0.01498032 0.0151093 ]
mean value: 0.020588135719299315
key: test_mcc
value: [0.85880465 0.53486983 0.4645821 0.35805744 0.42857143 0.61065803
0.3992747 0.52935027 0.49137176 0.60268595]
mean value: 0.527822615672422
key: train_mcc
value: [0.66070893 0.7007593 0.66886394 0.69260708 0.71683257 0.67367686
0.71790128 0.71743162 0.67772756 0.67363991]
mean value: 0.690014906297328
key: test_accuracy
value: [0.92857143 0.76785714 0.73214286 0.67857143 0.71428571 0.80357143
0.69642857 0.76363636 0.74545455 0.8 ]
mean value: 0.7630519480519481
key: train_accuracy
value: [0.83033932 0.8502994 0.83433134 0.84630739 0.85828343 0.83632735
0.85828343 0.85856574 0.83864542 0.83665339]
mean value: 0.8448036198519296
key: test_fscore
value: [0.92307692 0.75471698 0.72727273 0.66666667 0.71428571 0.79245283
0.66666667 0.74509804 0.73076923 0.78431373]
mean value: 0.7505319504764566
key: train_fscore
value: [0.82688391 0.84662577 0.82886598 0.84253579 0.85360825 0.82845188
0.85115304 0.85420945 0.83298969 0.83127572]
mean value: 0.8396599470532266
key: test_precision
value: [0.96 0.76923077 0.74074074 0.69230769 0.71428571 0.84
0.73913043 0.79166667 0.76 0.83333333]
mean value: 0.7840695351347525
key: train_precision
value: [0.83539095 0.85892116 0.84453782 0.85123967 0.8697479 0.85714286
0.8826087 0.87029289 0.85232068 0.8487395 ]
mean value: 0.857094210276311
key: test_recall
value: [0.88888889 0.74074074 0.71428571 0.64285714 0.71428571 0.75
0.60714286 0.7037037 0.7037037 0.74074074]
mean value: 0.7206349206349206
key: train_recall
value: [0.81854839 0.83467742 0.81376518 0.8340081 0.83805668 0.80161943
0.82186235 0.83870968 0.81451613 0.81451613]
mean value: 0.8230279482826173
key: test_roc_auc
value: [0.92720307 0.76692209 0.73214286 0.67857143 0.71428571 0.80357143
0.69642857 0.76256614 0.74470899 0.7989418 ]
mean value: 0.7625342090859333
key: train_roc_auc
value: [0.83022281 0.85014503 0.83404795 0.84613791 0.85800472 0.83584909
0.85778157 0.85833122 0.83836043 0.83639192]
mean value: 0.8445272634880454
key: test_jcc
value: [0.85714286 0.60606061 0.57142857 0.5 0.55555556 0.65625
0.5 0.59375 0.57575758 0.64516129]
mean value: 0.6061106456267746
key: train_jcc
value: [0.70486111 0.73404255 0.70774648 0.72791519 0.74460432 0.70714286
0.74087591 0.74551971 0.71378092 0.71126761]
mean value: 0.7237756661243875
MCC on Blind test: 0.25
Accuracy on Blind test: 0.68
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02742267 0.02210236 0.02209449 0.0219636 0.02208066 0.02287722
0.02279329 0.02308583 0.02478433 0.02252221]
mean value: 0.023172664642333984
key: score_time
value: [0.01376057 0.01196885 0.0119679 0.011971 0.01200175 0.01229978
0.01202559 0.01241159 0.01239586 0.01161957]
mean value: 0.0122422456741333
key: test_mcc
value: [0.89342711 0.82149863 0.89342711 0.71428571 0.71428571 0.85714286
0.71611487 0.71049701 0.75033796 0.82269299]
mean value: 0.7893709978851686
key: train_mcc
value: [0.78043222 0.79646836 0.79237674 0.80040802 0.80040802 0.78839993
0.80036023 0.80483837 0.80476867 0.78902126]
mean value: 0.7957481809735135
key: test_accuracy
value: [0.94642857 0.91071429 0.94642857 0.85714286 0.85714286 0.92857143
0.85714286 0.85454545 0.87272727 0.90909091]
mean value: 0.8939935064935065
key: train_accuracy
value: [0.89021956 0.89820359 0.89620758 0.9001996 0.9001996 0.89421158
0.9001996 0.90239044 0.90239044 0.89442231]
mean value: 0.8978644305015467
key: test_fscore
value: [0.94545455 0.90566038 0.94736842 0.85714286 0.85714286 0.92857143
0.86206897 0.84615385 0.87719298 0.90196078]
mean value: 0.8928717065163764
key: train_fscore
value: [0.88933602 0.89779559 0.89430894 0.89919355 0.89919355 0.89292929
0.89878543 0.90180361 0.90140845 0.89421158]
mean value: 0.8968965999938038
key: test_precision
value: [0.92857143 0.92307692 0.93103448 0.85714286 0.85714286 0.92857143
0.83333333 0.88 0.83333333 0.95833333]
mean value: 0.8930539977264116
key: train_precision
value: [0.8875502 0.89243028 0.89795918 0.89558233 0.89558233 0.89112903
0.89878543 0.89641434 0.89959839 0.88537549]
mean value: 0.8940407009629887
key: test_recall
value: [0.96296296 0.88888889 0.96428571 0.85714286 0.85714286 0.92857143
0.89285714 0.81481481 0.92592593 0.85185185]
mean value: 0.8944444444444444
key: train_recall
value: [0.89112903 0.90322581 0.89068826 0.90283401 0.90283401 0.89473684
0.89878543 0.90725806 0.90322581 0.90322581]
mean value: 0.8997943058639154
key: test_roc_auc
value: [0.94699872 0.90996169 0.94642857 0.85714286 0.85714286 0.92857143
0.85714286 0.85383598 0.87367725 0.90806878]
mean value: 0.8938970990695129
key: train_roc_auc
value: [0.89022855 0.89825322 0.89613153 0.9002359 0.9002359 0.89421881
0.90018011 0.90244793 0.9024003 0.89452629]
mean value: 0.8978858554311018
key: test_jcc
value: [0.89655172 0.82758621 0.9 0.75 0.75 0.86666667
0.75757576 0.73333333 0.78125 0.82142857]
mean value: 0.8084392260038812
key: train_jcc
value: [0.80072464 0.81454545 0.80882353 0.81684982 0.81684982 0.80656934
0.81617647 0.82116788 0.82051282 0.80866426]
mean value: 0.8130884032644238
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.94465923 2.00514841 2.06666064 2.05695105 2.04768014 1.9460156
2.0034411 1.29994416 2.12981367 2.01057458]
mean value: 1.9510888576507568
key: score_time
value: [0.01509047 0.01248336 0.02142859 0.01991558 0.02001476 0.01998496
0.0137403 0.01238942 0.01494431 0.02116203]
mean value: 0.01711537837982178
key: test_mcc
value: [0.82195294 0.71392082 0.85933785 0.60753044 0.67900461 0.83484711
0.71428571 0.78174603 0.82337971 0.8565805 ]
mean value: 0.7692585719374775
key: train_mcc
value: [0.98803016 0.99601537 0.99601492 1. 0.99601492 0.99601492
0.99204516 0.94820483 0.98409121 0.9841835 ]
mean value: 0.9880614980481577
key: test_accuracy
value: [0.91071429 0.85714286 0.92857143 0.80357143 0.83928571 0.91071429
0.85714286 0.89090909 0.90909091 0.92727273]
mean value: 0.8834415584415585
key: train_accuracy
value: [0.99401198 0.99800399 0.99800399 1. 0.99800399 0.99800399
0.99600798 0.97410359 0.99203187 0.99203187]
mean value: 0.9940203258821003
key: test_fscore
value: [0.90909091 0.85185185 0.93103448 0.80701754 0.84210526 0.90196078
0.85714286 0.88888889 0.9122807 0.92307692]
mean value: 0.8824450205895706
key: train_fscore
value: [0.99393939 0.9979798 0.9979716 1. 0.9979716 0.9979716
0.99593496 0.97373737 0.99190283 0.99186992]
mean value: 0.9939279085015674
key: test_precision
value: [0.89285714 0.85185185 0.9 0.79310345 0.82758621 1.
0.85714286 0.88888889 0.86666667 0.96 ]
mean value: 0.8838097062579822
key: train_precision
value: [0.99595142 1. 1. 1. 1. 1.
1. 0.9757085 0.99593496 1. ]
mean value: 0.9967594878377933
key: test_recall
value: [0.92592593 0.85185185 0.96428571 0.82142857 0.85714286 0.82142857
0.85714286 0.88888889 0.96296296 0.88888889]
mean value: 0.883994708994709
key: train_recall
value: [0.99193548 0.99596774 0.99595142 1. 0.99595142 0.99595142
0.99190283 0.97177419 0.98790323 0.98387097]
mean value: 0.9911208697923468
key: test_roc_auc
value: [0.91123883 0.85696041 0.92857143 0.80357143 0.83928571 0.91071429
0.85714286 0.89087302 0.91005291 0.9265873 ]
mean value: 0.8834998175515417
key: train_roc_auc
value: [0.99399146 0.99798387 0.99797571 1. 0.99797571 0.99797571
0.99595142 0.97407607 0.99198311 0.99193548]
mean value: 0.9939848536817699
key: test_jcc
value: [0.83333333 0.74193548 0.87096774 0.67647059 0.72727273 0.82142857
0.75 0.8 0.83870968 0.85714286]
mean value: 0.791726098063859
key: train_jcc
value: [0.98795181 0.99596774 0.99595142 1. 0.99595142 0.99595142
0.99190283 0.9488189 0.98393574 0.98387097]
mean value: 0.9880302242536261
MCC on Blind test: 0.25
Accuracy on Blind test: 0.63
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03864717 0.02425408 0.02131462 0.02202463 0.01960993 0.02493072
0.02551031 0.02189636 0.0220654 0.02170563]
mean value: 0.02419588565826416
key: score_time
value: [0.00941229 0.00976038 0.00921798 0.00955081 0.00952029 0.00929213
0.00879693 0.00893784 0.00880909 0.00895596]
mean value: 0.00922536849975586
key: test_mcc
value: [0.96481304 0.89342711 0.82618439 0.85933785 0.85714286 0.82195294
0.85933785 0.85695439 1. 0.78961518]
mean value: 0.8728765613403809
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98214286 0.94642857 0.91071429 0.92857143 0.92857143 0.91071429
0.92857143 0.92727273 1. 0.89090909]
mean value: 0.9353896103896104
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98113208 0.94545455 0.90566038 0.93103448 0.92857143 0.9122807
0.93103448 0.92857143 1. 0.88 ]
mean value: 0.9343739522699218
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92857143 0.96 0.9 0.92857143 0.89655172
0.9 0.89655172 1. 0.95652174]
mean value: 0.9366768044549154
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 0.96296296 0.85714286 0.96428571 0.92857143 0.92857143
0.96428571 0.96296296 1. 0.81481481]
mean value: 0.9346560846560846
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98148148 0.94699872 0.91071429 0.92857143 0.92857143 0.91071429
0.92857143 0.92791005 1. 0.88955026]
mean value: 0.9353083378945448
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96296296 0.89655172 0.82758621 0.87096774 0.86666667 0.83870968
0.87096774 0.86666667 1. 0.78571429]
mean value: 0.8786793674335387
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.47
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13840342 0.1259439 0.12937999 0.12996531 0.12474942 0.13039589
0.1298151 0.12556314 0.12853265 0.12834382]
mean value: 0.12910926342010498
key: score_time
value: [0.01946807 0.01797915 0.01792121 0.0182476 0.01871538 0.017977
0.01843834 0.01799345 0.01811767 0.01798201]
mean value: 0.018283987045288087
key: test_mcc
value: [0.9284802 0.60652703 0.85933785 0.75047877 0.75047877 0.85714286
0.75047877 0.78174603 0.81878307 0.81854376]
mean value: 0.7921997127312963
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96428571 0.80357143 0.92857143 0.875 0.875 0.92857143
0.875 0.89090909 0.90909091 0.90909091]
mean value: 0.8959090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.79245283 0.92592593 0.87719298 0.87719298 0.92857143
0.87719298 0.88888889 0.90909091 0.90566038]
mean value: 0.8945132270355706
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.80769231 0.96153846 0.86206897 0.86206897 0.92857143
0.86206897 0.88888889 0.89285714 0.92307692]
mean value: 0.8951795012139839
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 0.77777778 0.89285714 0.89285714 0.89285714 0.92857143
0.89285714 0.88888889 0.92592593 0.88888889]
mean value: 0.8944444444444445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9642401 0.80268199 0.92857143 0.875 0.875 0.92857143
0.875 0.89087302 0.90939153 0.90873016]
mean value: 0.8958059660645867
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.65625 0.86206897 0.78125 0.78125 0.86666667
0.78125 0.8 0.83333333 0.82758621]
mean value: 0.8118226600985222
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01071095 0.01106501 0.01063919 0.01035523 0.01046157 0.01181698
0.01045609 0.01055861 0.01104879 0.01108813]
mean value: 0.01082005500793457
key: score_time
value: [0.00896955 0.00953364 0.00946403 0.00884151 0.00904202 0.00962973
0.00890636 0.00897074 0.00888109 0.00932431]
mean value: 0.009156298637390137
key: test_mcc
value: [0.7549598 0.18170219 0.50128041 0.4330127 0.53881591 0.40574111
0.75434227 0.56441351 0.53121272 0.60876172]
mean value: 0.5274242337751618
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.58928571 0.75 0.71428571 0.76785714 0.69642857
0.875 0.78181818 0.76363636 0.8 ]
mean value: 0.7613311688311688
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.87719298 0.59649123 0.74074074 0.69230769 0.77966102 0.73015873
0.86792453 0.76923077 0.77192982 0.7755102 ]
mean value: 0.7601147716858323
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.56666667 0.76923077 0.75 0.74193548 0.65714286
0.92 0.8 0.73333333 0.86363636]
mean value: 0.7635278807214291
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92592593 0.62962963 0.71428571 0.64285714 0.82142857 0.82142857
0.82142857 0.74074074 0.81481481 0.7037037 ]
mean value: 0.7636243386243386
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87675607 0.59067688 0.75 0.71428571 0.76785714 0.69642857
0.875 0.78108466 0.76455026 0.79828042]
mean value: 0.7614919722678344
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.78125 0.425 0.58823529 0.52941176 0.63888889 0.575
0.76666667 0.625 0.62857143 0.63333333]
mean value: 0.6191357376283847
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.87882328 1.90987062 1.84491491 1.94657969 1.94930863 1.92319202
1.93694496 1.91315603 1.92559934 1.88178062]
mean value: 1.9110170125961303
key: score_time
value: [0.09425449 0.09519291 0.09458399 0.10057092 0.09676242 0.0935111
0.09894633 0.09861422 0.09709573 0.10022926]
mean value: 0.09697613716125489
key: test_mcc
value: [1. 0.85951469 0.92857143 0.93094934 0.82195294 0.96490128
0.92857143 0.85449735 1. 0.89602867]
mean value: 0.9184987133098356
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.92857143 0.96428571 0.96428571 0.91071429 0.98214286
0.96428571 0.92727273 1. 0.94545455]
mean value: 0.9587012987012987
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.92857143 0.96428571 0.96551724 0.90909091 0.98181818
0.96428571 0.92592593 1. 0.94117647]
mean value: 0.958067158594542
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89655172 0.96428571 0.93333333 0.92592593 1.
0.96428571 0.92592593 1. 1. ]
mean value: 0.9610308337894545
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.89285714 0.96428571
0.96428571 0.92592593 1. 0.88888889]
mean value: 0.9563492063492064
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92975734 0.96428571 0.96428571 0.91071429 0.98214286
0.96428571 0.92724868 1. 0.94444444]
mean value: 0.9587164750957855
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.86666667 0.93103448 0.93333333 0.83333333 0.96428571
0.93103448 0.86206897 1. 0.88888889]
mean value: 0.921064586754242
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.61
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.89666176 1.00527024 1.07393312 1.04313874 1.00363135 1.01725841
0.97884154 1.00802183 1.00800323 0.97835732]
mean value: 1.1013117551803588
key: score_time
value: [0.26713657 0.2600987 0.2608161 0.24226403 0.25155401 0.22661448
0.25301027 0.28414631 0.28104687 0.2326901 ]
mean value: 0.2559377431869507
key: test_mcc
value: [1. 0.74984143 0.92857143 0.93094934 0.82195294 0.85714286
0.92857143 0.85449735 1. 0.8565805 ]
mean value: 0.8928107284174787
key: train_mcc
value: [0.94410621 0.95608932 0.95212215 0.95608442 0.96010711 0.9441372
0.94410086 0.95617454 0.94820977 0.9562436 ]
mean value: 0.9517375181782505
key: test_accuracy
value: [1. 0.875 0.96428571 0.96428571 0.91071429 0.92857143
0.96428571 0.92727273 1. 0.92727273]
mean value: 0.9461688311688312
key: train_accuracy
value: [0.97205589 0.97804391 0.9760479 0.97804391 0.98003992 0.97205589
0.97205589 0.97808765 0.97410359 0.97808765]
mean value: 0.9758622197835405
key: test_fscore
value: [1. 0.86792453 0.96428571 0.96551724 0.90909091 0.92857143
0.96428571 0.92592593 1. 0.92307692]
mean value: 0.9448678384917812
key: train_fscore
value: [0.97177419 0.97777778 0.97580645 0.97768763 0.97983871 0.97177419
0.97165992 0.97777778 0.97384306 0.97795591]
mean value: 0.9755895619919588
key: test_precision
value: [1. 0.88461538 0.96428571 0.93333333 0.92592593 0.92857143
0.96428571 0.92592593 1. 0.96 ]
mean value: 0.9486943426943427
key: train_precision
value: [0.97177419 0.97975709 0.97188755 0.9796748 0.97590361 0.96787149
0.97165992 0.97975709 0.97188755 0.97211155]
mean value: 0.9742284833953254
key: test_recall
value: [1. 0.85185185 0.96428571 1. 0.89285714 0.92857143
0.96428571 0.92592593 1. 0.88888889]
mean value: 0.9416666666666667
key: train_recall
value: [0.97177419 0.97580645 0.97975709 0.9757085 0.98380567 0.9757085
0.97165992 0.97580645 0.97580645 0.98387097]
mean value: 0.9769704192242392
key: test_roc_auc
value: [1. 0.87420179 0.96428571 0.96428571 0.91071429 0.92857143
0.96428571 0.92724868 1. 0.9265873 ]
mean value: 0.9460180623973727
key: train_roc_auc
value: [0.9720531 0.9780218 0.97609901 0.97801173 0.98009181 0.97210622
0.97205043 0.97806071 0.9741237 0.97815596]
mean value: 0.9758774476377025
key: test_jcc
value: [1. 0.76666667 0.93103448 0.93333333 0.83333333 0.86666667
0.93103448 0.86206897 1. 0.85714286]
mean value: 0.8981280788177339
key: train_jcc
value: [0.94509804 0.95652174 0.95275591 0.95634921 0.96047431 0.94509804
0.94488189 0.95652174 0.94901961 0.95686275]
mean value: 0.9523583219558611
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02486706 0.01028371 0.01111889 0.01084995 0.01055169 0.01073956
0.01036692 0.01024008 0.01063538 0.01101923]
mean value: 0.012067246437072753
key: score_time
value: [0.01320982 0.00907779 0.00915051 0.00921273 0.00907063 0.00885177
0.00918436 0.00913143 0.00959134 0.00949717]
mean value: 0.00959775447845459
key: test_mcc
value: [0.9284802 0.6431407 0.60753044 0.67900461 0.67900461 0.64450339
0.57142857 0.63745526 0.77174363 0.74603175]
mean value: 0.690832314648529
key: train_mcc
value: [0.74057724 0.74448553 0.69260708 0.7804155 0.740478 0.69270547
0.7484655 0.72133625 0.76096895 0.74917652]
mean value: 0.7371216028225991
key: test_accuracy
value: [0.96428571 0.82142857 0.80357143 0.83928571 0.83928571 0.82142857
0.78571429 0.81818182 0.87272727 0.87272727]
mean value: 0.8438636363636364
key: train_accuracy
value: [0.87025948 0.87225549 0.84630739 0.89021956 0.87025948 0.84630739
0.8742515 0.86055777 0.88047809 0.87450199]
mean value: 0.8685398128046695
key: test_fscore
value: [0.96296296 0.80769231 0.8 0.83636364 0.84210526 0.81481481
0.78571429 0.80769231 0.8852459 0.87272727]
mean value: 0.8415318752764827
key: train_fscore
value: [0.86973948 0.87096774 0.84253579 0.88798371 0.86761711 0.84188912
0.87169043 0.85655738 0.87951807 0.8742515 ]
mean value: 0.8662750313964435
key: test_precision
value: [0.96296296 0.84 0.81481481 0.85185185 0.82758621 0.84615385
0.78571429 0.84 0.79411765 0.85714286]
mean value: 0.8420344472595993
key: train_precision
value: [0.86454183 0.87096774 0.85123967 0.89344262 0.87295082 0.85416667
0.87704918 0.87083333 0.876 0.86561265]
mean value: 0.8696804515198457
key: test_recall
value: [0.96296296 0.77777778 0.78571429 0.82142857 0.85714286 0.78571429
0.78571429 0.77777778 1. 0.88888889]
mean value: 0.8443121693121693
key: train_recall
value: [0.875 0.87096774 0.8340081 0.88259109 0.86234818 0.82995951
0.86639676 0.84274194 0.88306452 0.88306452]
mean value: 0.8630142353402116
key: test_roc_auc
value: [0.9642401 0.81992337 0.80357143 0.83928571 0.83928571 0.82142857
0.78571429 0.81746032 0.875 0.87301587]
mean value: 0.8438925378580551
key: train_roc_auc
value: [0.87030632 0.87224276 0.84613791 0.89011444 0.87015047 0.84608212
0.87414326 0.86034735 0.88050864 0.87460312]
mean value: 0.8684636394092362
key: test_jcc
value: [0.92857143 0.67741935 0.66666667 0.71875 0.72727273 0.6875
0.64705882 0.67741935 0.79411765 0.77419355]
mean value: 0.7298969551163574
key: train_jcc
value: [0.76950355 0.77142857 0.72791519 0.7985348 0.76618705 0.72695035
0.77256318 0.74910394 0.78494624 0.77659574]
mean value: 0.7643728616166219
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.12475324 0.08080173 0.08375597 0.08054066 0.08045411 0.09029341
0.08177567 0.07712722 0.07821131 0.07184291]
mean value: 0.08495562076568604
key: score_time
value: [0.01098704 0.01110721 0.01107979 0.01108408 0.01103806 0.01121902
0.01095128 0.01111293 0.01077819 0.01080561]
mean value: 0.011016321182250977
key: test_mcc
value: [1. 0.89342711 0.89342711 0.93094934 0.89342711 0.96490128
0.96490128 0.89153439 1. 0.89139151]
mean value: 0.932395913795342
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94642857 0.94642857 0.96428571 0.94642857 0.98214286
0.98214286 0.94545455 1. 0.94545455]
mean value: 0.9658766233766234
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94545455 0.94545455 0.96551724 0.94736842 0.98181818
0.98245614 0.94545455 1. 0.94339623]
mean value: 0.9656919847379731
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92857143 0.96296296 0.93333333 0.93103448 1.
0.96551724 0.92857143 1. 0.96153846]
mean value: 0.9611529339115547
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.92857143 1. 0.96428571 0.96428571
1. 0.96296296 1. 0.92592593]
mean value: 0.9708994708994709
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94699872 0.94642857 0.96428571 0.94642857 0.98214286
0.98214286 0.9457672 1. 0.94510582]
mean value: 0.965930031016238
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89655172 0.89655172 0.93333333 0.9 0.96428571
0.96551724 0.89655172 1. 0.89285714]
mean value: 0.9345648604269294
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.35
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05492663 0.05176282 0.10891724 0.06809163 0.04307985 0.04353976
0.07512021 0.06407142 0.07399178 0.08978105]
mean value: 0.06732823848724365
key: score_time
value: [0.01222754 0.0186336 0.01953959 0.01344514 0.01201749 0.01820254
0.01937747 0.01236653 0.01234722 0.02103639]
mean value: 0.01591935157775879
key: test_mcc
value: [0.72273097 0.78544061 0.85714286 0.75047877 0.78772636 0.82618439
0.71611487 0.74569602 0.71735629 0.8565805 ]
mean value: 0.7765451645733424
key: train_mcc
value: [0.89240405 0.89647918 0.89632475 0.89622747 0.91623104 0.9124496
0.90430958 0.90450187 0.90039607 0.89261761]
mean value: 0.9011941219984033
key: test_accuracy
value: [0.85714286 0.89285714 0.92857143 0.875 0.89285714 0.91071429
0.85714286 0.87272727 0.85454545 0.92727273]
mean value: 0.8868831168831168
key: train_accuracy
value: [0.94610778 0.94810379 0.94810379 0.94810379 0.95808383 0.95608782
0.95209581 0.95219124 0.9501992 0.94621514]
mean value: 0.950529220443575
key: test_fscore
value: [0.86206897 0.88888889 0.92857143 0.87719298 0.89655172 0.90566038
0.86206897 0.86792453 0.86206897 0.92307692]
mean value: 0.8874073749343413
key: train_fscore
value: [0.94610778 0.94820717 0.94779116 0.94758065 0.95774648 0.956
0.95180723 0.952 0.94969819 0.94610778]
mean value: 0.9503046446920652
key: test_precision
value: [0.80645161 0.88888889 0.92857143 0.86206897 0.86666667 0.96
0.83333333 0.88461538 0.80645161 0.96 ]
mean value: 0.8797047893399395
key: train_precision
value: [0.93675889 0.93700787 0.94023904 0.9437751 0.952 0.94466403
0.94422311 0.94444444 0.94779116 0.93675889]
mean value: 0.9427662553096674
key: test_recall
value: [0.92592593 0.88888889 0.92857143 0.89285714 0.92857143 0.85714286
0.89285714 0.85185185 0.92592593 0.88888889]
mean value: 0.8981481481481481
key: train_recall
value: [0.95564516 0.95967742 0.95546559 0.951417 0.96356275 0.96761134
0.95951417 0.95967742 0.9516129 0.95564516]
mean value: 0.9579828914718558
key: test_roc_auc
value: [0.85951469 0.89272031 0.92857143 0.875 0.89285714 0.91071429
0.85714286 0.8723545 0.85582011 0.9265873 ]
mean value: 0.8871282612661924
key: train_roc_auc
value: [0.94620203 0.94821816 0.94820523 0.94814945 0.95815933 0.95624661
0.95219803 0.95227965 0.9502159 0.94632652]
mean value: 0.950620090969503
key: test_jcc
value: [0.75757576 0.8 0.86666667 0.78125 0.8125 0.82758621
0.75757576 0.76666667 0.75757576 0.85714286]
mean value: 0.7984539670100015
key: train_jcc
value: [0.89772727 0.90151515 0.90076336 0.90038314 0.91891892 0.91570881
0.90804598 0.90839695 0.90421456 0.89772727]
mean value: 0.9053401411653583
MCC on Blind test: 0.22
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01412702 0.01368785 0.01042676 0.01001048 0.00986242 0.00989866
0.00987029 0.0099175 0.01003695 0.00978541]
mean value: 0.010762333869934082
key: score_time
value: [0.0248754 0.00975728 0.00907731 0.00859451 0.00872374 0.00872278
0.00863624 0.00862908 0.00866938 0.00869274]
mean value: 0.010437846183776855
key: test_mcc
value: [0.89342711 0.6431407 0.68965631 0.68250015 0.60753044 0.67900461
0.64450339 0.67328042 0.71735629 0.81854376]
mean value: 0.7048943172837225
key: train_mcc
value: [0.75247462 0.72067111 0.6806649 0.76441802 0.73651066 0.72061769
0.7364755 0.74502957 0.74898578 0.76490153]
mean value: 0.7370749397486276
key: test_accuracy
value: [0.94642857 0.82142857 0.83928571 0.83928571 0.80357143 0.83928571
0.82142857 0.83636364 0.85454545 0.90909091]
mean value: 0.8510714285714286
key: train_accuracy
value: [0.8762475 0.86027944 0.84031936 0.88223553 0.86826347 0.86027944
0.86826347 0.87250996 0.87450199 0.88247012]
mean value: 0.8685370295266042
key: test_fscore
value: [0.94545455 0.80769231 0.82352941 0.83018868 0.8 0.83636364
0.81481481 0.83636364 0.86206897 0.90566038]
mean value: 0.8462136374574661
key: train_fscore
value: [0.87449393 0.85714286 0.83606557 0.88032454 0.86530612 0.85655738
0.86639676 0.8699187 0.87221095 0.88080808]
mean value: 0.8659224895623094
key: test_precision
value: [0.92857143 0.84 0.91304348 0.88 0.81481481 0.85185185
0.84615385 0.82142857 0.80645161 0.92307692]
mean value: 0.8625392527061531
key: train_precision
value: [0.87804878 0.8677686 0.84647303 0.88211382 0.87242798 0.86721992
0.86639676 0.87704918 0.87755102 0.88259109]
mean value: 0.8717640181251569
key: test_recall
value: [0.96296296 0.77777778 0.75 0.78571429 0.78571429 0.82142857
0.78571429 0.85185185 0.92592593 0.88888889]
mean value: 0.8335978835978836
key: train_recall
value: [0.87096774 0.84677419 0.82591093 0.87854251 0.8582996 0.84615385
0.86639676 0.86290323 0.86693548 0.87903226]
mean value: 0.8601916546950503
key: test_roc_auc
value: [0.94699872 0.81992337 0.83928571 0.83928571 0.80357143 0.83928571
0.82142857 0.83664021 0.85582011 0.90873016]
mean value: 0.851096971355592
key: train_roc_auc
value: [0.87619533 0.86014599 0.84012082 0.88218464 0.86812618 0.8600848
0.86823775 0.87239649 0.87441262 0.88242951]
mean value: 0.86843341410175
key: test_jcc
value: [0.89655172 0.67741935 0.7 0.70967742 0.66666667 0.71875
0.6875 0.71875 0.75757576 0.82758621]
mean value: 0.7360477129470455
key: train_jcc
value: [0.77697842 0.75 0.71830986 0.78623188 0.76258993 0.74910394
0.76428571 0.76978417 0.77338129 0.78700361]
mean value: 0.7637668823208889
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01758575 0.0198431 0.02237058 0.02029872 0.0240736 0.02260089
0.01758718 0.01784587 0.01909232 0.01727676]
mean value: 0.01985747814178467
key: score_time
value: [0.00994825 0.01115131 0.01172638 0.01171517 0.01169991 0.0116601
0.01156712 0.01167297 0.0117054 0.01167989]
mean value: 0.011452651023864746
key: test_mcc
value: [0.70299234 0.78799489 0.85714286 0.71428571 0.67900461 0.92857143
0.71659857 0.60876172 0.82337971 0.8565805 ]
mean value: 0.7675312343241391
key: train_mcc
value: [0.8286992 0.86914588 0.90049318 0.88030173 0.91678491 0.85174413
0.78363444 0.84370235 0.86453703 0.86480823]
mean value: 0.8603851093775248
key: test_accuracy
value: [0.83928571 0.89285714 0.92857143 0.85714286 0.83928571 0.96428571
0.83928571 0.8 0.90909091 0.92727273]
mean value: 0.8797077922077923
key: train_accuracy
value: [0.91017964 0.93413174 0.9500998 0.94011976 0.95808383 0.9241517
0.88423154 0.92031873 0.93227092 0.93227092]
mean value: 0.928585856176094
key: test_fscore
value: [0.85245902 0.89285714 0.92857143 0.85714286 0.84210526 0.96428571
0.80851064 0.7755102 0.9122807 0.92307692]
mean value: 0.8756799889619294
key: train_fscore
value: [0.91525424 0.93491124 0.9486653 0.93877551 0.9582505 0.92635659
0.86936937 0.91561181 0.93117409 0.93227092]
mean value: 0.9270639563121068
key: test_precision
value: [0.76470588 0.86206897 0.92857143 0.85714286 0.82758621 0.96428571
1. 0.86363636 0.86666667 0.96 ]
mean value: 0.8894664085069764
key: train_precision
value: [0.85865724 0.91505792 0.9625 0.94650206 0.94140625 0.88847584
0.97969543 0.96017699 0.93495935 0.92125984]
mean value: 0.930869091765427
key: test_recall
value: [0.96296296 0.92592593 0.92857143 0.85714286 0.85714286 0.96428571
0.67857143 0.7037037 0.96296296 0.88888889]
mean value: 0.873015873015873
key: train_recall
value: [0.97983871 0.95564516 0.93522267 0.93117409 0.9757085 0.96761134
0.78137652 0.875 0.92741935 0.94354839]
mean value: 0.9272544730312133
key: test_roc_auc
value: [0.84355045 0.89399745 0.92857143 0.85714286 0.83928571 0.96428571
0.83928571 0.79828042 0.91005291 0.9265873 ]
mean value: 0.880103995621237
key: train_roc_auc
value: [0.91086797 0.93434432 0.9498948 0.93999649 0.95832669 0.92475055
0.88281424 0.91978346 0.93221361 0.93240411]
mean value: 0.9285396264194379
key: test_jcc
value: [0.74285714 0.80645161 0.86666667 0.75 0.72727273 0.93103448
0.67857143 0.63333333 0.83870968 0.85714286]
mean value: 0.7832039928925357
key: train_jcc
value: [0.84375 0.87777778 0.90234375 0.88461538 0.91984733 0.86281588
0.7689243 0.84435798 0.87121212 0.87313433]
mean value: 0.8648778854126843
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02320075 0.02007747 0.02243185 0.02290845 0.0215466 0.01983714
0.02546191 0.01764059 0.0193162 0.01997018]
mean value: 0.021239113807678223
key: score_time
value: [0.01175237 0.01165366 0.01168561 0.01169753 0.01166391 0.0116353
0.01183295 0.01159787 0.01166773 0.0116539 ]
mean value: 0.011684083938598632
key: test_mcc
value: [0.96490128 0.66026156 0.72168784 0.82195294 0.49487166 0.89802651
0.70082556 0.60242771 0.79069197 0.8565805 ]
mean value: 0.7512227529393891
key: train_mcc
value: [0.88154038 0.81002887 0.87979163 0.90022801 0.65420186 0.88429173
0.84429267 0.7060842 0.89261761 0.89653312]
mean value: 0.8349610067434581
key: test_accuracy
value: [0.98214286 0.82142857 0.85714286 0.91071429 0.71428571 0.94642857
0.83928571 0.78181818 0.89090909 0.92727273]
mean value: 0.8671428571428571
key: train_accuracy
value: [0.94011976 0.9001996 0.93812375 0.9500998 0.80239521 0.94211577
0.91816367 0.83466135 0.94621514 0.94820717]
mean value: 0.9120301230208905
key: test_fscore
value: [0.98181818 0.83333333 0.84615385 0.9122807 0.61904762 0.94339623
0.81632653 0.72727273 0.89655172 0.92307692]
mean value: 0.8499257813622287
key: train_fscore
value: [0.93775934 0.90636704 0.93418259 0.9490835 0.75062972 0.9416499
0.91067538 0.8 0.94610778 0.948 ]
mean value: 0.902445525859967
key: test_precision
value: [0.96428571 0.75757576 0.91666667 0.89655172 0.92857143 1.
0.95238095 0.94117647 0.83870968 0.96 ]
mean value: 0.9155918391626041
key: train_precision
value: [0.96581197 0.84615385 0.98214286 0.95491803 0.99333333 0.936
0.98584906 0.99401198 0.93675889 0.94047619]
mean value: 0.9535456151637388
key: test_recall
value: [1. 0.92592593 0.78571429 0.92857143 0.46428571 0.89285714
0.71428571 0.59259259 0.96296296 0.88888889]
mean value: 0.8156084656084656
key: train_recall
value: [0.91129032 0.97580645 0.89068826 0.94331984 0.60323887 0.94736842
0.84615385 0.66935484 0.95564516 0.95564516]
mean value: 0.8698511166253102
key: test_roc_auc
value: [0.98275862 0.82503193 0.85714286 0.91071429 0.71428571 0.94642857
0.83928571 0.77843915 0.89219577 0.9265873 ]
mean value: 0.8672869914249225
key: train_roc_auc
value: [0.93983488 0.9009467 0.93747011 0.95000638 0.79965093 0.94218815
0.91717141 0.83270892 0.94632652 0.94829502]
mean value: 0.9114599020928051
key: test_jcc
value: [0.96428571 0.71428571 0.73333333 0.83870968 0.44827586 0.89285714
0.68965517 0.57142857 0.8125 0.85714286]
mean value: 0.7522474045235447
key: train_jcc
value: [0.8828125 0.82876712 0.87649402 0.90310078 0.60080645 0.88973384
0.836 0.66666667 0.89772727 0.90114068]
mean value: 0.8283249338107523
MCC on Blind test: 0.19
Accuracy on Blind test: 0.45
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20143199 0.18603349 0.18654966 0.18629313 0.18640566 0.18552709
0.1855998 0.18528461 0.18386316 0.18543124]
mean value: 0.1872419834136963
key: score_time
value: [0.01513767 0.01660371 0.01517773 0.01519632 0.01537299 0.01546884
0.01524711 0.01526022 0.01523232 0.0154779 ]
mean value: 0.01541748046875
key: test_mcc
value: [1. 0.89315584 0.92857143 0.89342711 0.78571429 0.93094934
0.93094934 0.89153439 1. 0.89139151]
mean value: 0.9145693236924946
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94642857 0.96428571 0.94642857 0.89285714 0.96428571
0.96428571 0.94545455 1. 0.94545455]
mean value: 0.956948051948052
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94339623 0.96428571 0.94736842 0.89285714 0.96296296
0.96296296 0.94545455 1. 0.94339623]
mean value: 0.9562684202406149
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96153846 0.96428571 0.93103448 0.89285714 1.
1. 0.92857143 1. 0.96153846]
mean value: 0.9639825691549829
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.92592593 0.96428571 0.96428571 0.89285714 0.92857143
0.92857143 0.96296296 1. 0.92592593]
mean value: 0.9493386243386244
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94572158 0.96428571 0.94642857 0.89285714 0.96428571
0.96428571 0.9457672 1. 0.94510582]
mean value: 0.9568737456668491
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89285714 0.93103448 0.9 0.80645161 0.92857143
0.92857143 0.89655172 1. 0.89285714]
mean value: 0.9176894962656921
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.43
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.07632256 0.06938648 0.06508446 0.06937695 0.0748291 0.08184791
0.06407475 0.05834961 0.05477071 0.06439114]
mean value: 0.06784336566925049
key: score_time
value: [0.02440453 0.0258677 0.03429985 0.02032709 0.03834176 0.02059245
0.02165937 0.02194166 0.0186646 0.024894 ]
mean value: 0.025099301338195802
key: test_mcc
value: [1. 0.82149863 0.96490128 0.93094934 0.82618439 0.96490128
0.96490128 0.89153439 1. 0.89139151]
mean value: 0.925626210567648
key: train_mcc
value: [0.98403035 0.98803016 0.98802882 0.99204516 0.99601537 0.99201441
0.99201441 0.99602309 0.9760922 0.99203073]
mean value: 0.9896324686793974
key: test_accuracy
value: [1. 0.91071429 0.98214286 0.96428571 0.91071429 0.98214286
0.98214286 0.94545455 1. 0.94545455]
mean value: 0.9623051948051948
key: train_accuracy
value: [0.99201597 0.99401198 0.99401198 0.99600798 0.99800399 0.99600798
0.99600798 0.99800797 0.98804781 0.99601594]
mean value: 0.9948139577418867
key: test_fscore
value: [1. 0.90566038 0.98245614 0.96551724 0.91525424 0.98181818
0.98245614 0.94545455 1. 0.94339623]
mean value: 0.9622013090415512
key: train_fscore
value: [0.99193548 0.99393939 0.99391481 0.99593496 0.9979798 0.99595142
0.99595142 0.9979798 0.98790323 0.99596774]
mean value: 0.9947458042171815
key: test_precision
value: [1. 0.92307692 0.96551724 0.93333333 0.87096774 1.
0.96551724 0.92857143 1. 0.96153846]
mean value: 0.9548522371214251
key: train_precision
value: [0.99193548 0.99595142 0.99593496 1. 0.99596774 0.99595142
0.99595142 1. 0.98790323 0.99596774]
mean value: 0.9955563403910126
key: test_recall
value: [1. 0.88888889 1. 1. 0.96428571 0.96428571
1. 0.96296296 1. 0.92592593]
mean value: 0.9706349206349206
key: train_recall
value: [0.99193548 0.99193548 0.99190283 0.99190283 1. 0.99595142
0.99595142 0.99596774 0.98790323 0.99596774]
mean value: 0.9939418179443646
key: test_roc_auc
value: [1. 0.90996169 0.98214286 0.96428571 0.91071429 0.98214286
0.98214286 0.9457672 1. 0.94510582]
mean value: 0.9622263273125342
key: train_roc_auc
value: [0.99201517 0.99399146 0.99398291 0.99595142 0.9980315 0.9960072
0.9960072 0.99798387 0.9880461 0.99601537]
mean value: 0.994803220447082
key: test_jcc
value: [1. 0.82758621 0.96551724 0.93333333 0.84375 0.96428571
0.96551724 0.89655172 1. 0.89285714]
mean value: 0.9289398604269294
key: train_jcc
value: [0.984 0.98795181 0.98790323 0.99190283 0.99596774 0.99193548
0.99193548 0.99596774 0.97609562 0.99196787]
mean value: 0.9895627807672192
MCC on Blind test: 0.11
Accuracy on Blind test: 0.31
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.15868926 0.13006496 0.15043187 0.17151141 0.18619037 0.20392632
0.16918993 0.16662097 0.17729712 0.17640066]
mean value: 0.16903228759765626
key: score_time
value: [0.02504158 0.02538013 0.01512814 0.02922416 0.02523613 0.029881
0.02491045 0.0251205 0.02523446 0.02540874]
mean value: 0.02505652904510498
key: test_mcc
value: [0.78691666 0.64240102 0.67900461 0.57142857 0.57142857 0.71611487
0.57735027 0.67328042 0.67328042 0.71049701]
mean value: 0.6601702430541447
key: train_mcc
value: [0.98415334 0.98809222 0.98809052 0.98415084 0.98405842 0.98415084
0.98415084 0.98811501 0.9841835 0.9841835 ]
mean value: 0.9853329016739125
key: test_accuracy
value: [0.89285714 0.82142857 0.83928571 0.78571429 0.78571429 0.85714286
0.78571429 0.83636364 0.83636364 0.85454545]
mean value: 0.829512987012987
key: train_accuracy
value: [0.99201597 0.99401198 0.99401198 0.99201597 0.99201597 0.99201597
0.99201597 0.9940239 0.99203187 0.99203187]
mean value: 0.992619144181756
key: test_fscore
value: [0.88461538 0.81481481 0.83636364 0.78571429 0.78571429 0.85185185
0.76923077 0.83636364 0.83636364 0.84615385]
mean value: 0.8247186147186147
key: train_fscore
value: [0.99186992 0.99391481 0.99389002 0.99183673 0.99186992 0.99183673
0.99183673 0.99391481 0.99186992 0.99186992]
mean value: 0.9924709513849441
key: test_precision
value: [0.92 0.81481481 0.85185185 0.78571429 0.78571429 0.88461538
0.83333333 0.82142857 0.82142857 0.88 ]
mean value: 0.8398901098901099
key: train_precision
value: [1. 1. 1. 1. 0.99591837 1.
1. 1. 1. 1. ]
mean value: 0.9995918367346939
key: test_recall
value: [0.85185185 0.81481481 0.82142857 0.78571429 0.78571429 0.82142857
0.71428571 0.85185185 0.85185185 0.81481481]
mean value: 0.8113756613756613
key: train_recall
value: [0.98387097 0.98790323 0.98785425 0.98380567 0.98785425 0.98380567
0.98380567 0.98790323 0.98387097 0.98387097]
mean value: 0.9854544860911584
key: test_roc_auc
value: [0.89144317 0.82120051 0.83928571 0.78571429 0.78571429 0.85714286
0.78571429 0.83664021 0.83664021 0.85383598]
mean value: 0.829333150884875
key: train_roc_auc
value: [0.99193548 0.99395161 0.99392713 0.99190283 0.99195862 0.99190283
0.99190283 0.99395161 0.99193548 0.99193548]
mean value: 0.9925303926518784
key: test_jcc
value: [0.79310345 0.6875 0.71875 0.64705882 0.64705882 0.74193548
0.625 0.71875 0.71875 0.73333333]
mean value: 0.7031239912538987
key: train_jcc
value: [0.98387097 0.98790323 0.98785425 0.98380567 0.98387097 0.98380567
0.98380567 0.98790323 0.98387097 0.98387097]
mean value: 0.9850561577641374
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.77394152 0.76272011 0.75519323 0.76581335 0.75918293 0.76116228
0.7646122 0.76457858 0.75582981 0.76644087]
mean value: 0.7629474878311158
key: score_time
value: [0.00938392 0.00930333 0.00939107 0.00936723 0.00953603 0.00941896
0.00916553 0.00935197 0.0094955 0.00944281]
mean value: 0.009385633468627929
key: test_mcc
value: [1. 0.9284802 0.89342711 0.93094934 0.85933785 0.96490128
0.96490128 0.89153439 1. 0.89139151]
mean value: 0.9324922966413355
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.96428571 0.94642857 0.96428571 0.92857143 0.98214286
0.98214286 0.94545455 1. 0.94545455]
mean value: 0.9658766233766234
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.96296296 0.94736842 0.96551724 0.93103448 0.98181818
0.98245614 0.94545455 1. 0.94339623]
mean value: 0.9660008202192224
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.93103448 0.93333333 0.9 1.
0.96551724 0.92857143 1. 0.96153846]
mean value: 0.9582957910544118
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.96428571 0.96428571
1. 0.96296296 1. 0.92592593]
mean value: 0.9744708994708995
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9642401 0.94642857 0.96428571 0.92857143 0.98214286
0.98214286 0.9457672 1. 0.94510582]
mean value: 0.9658684546615581
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.92857143 0.9 0.93333333 0.87096774 0.96428571
0.96551724 0.89655172 1. 0.89285714]
mean value: 0.9352084326500345
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.29
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.05088902 0.03218055 0.03262019 0.03219128 0.03217649 0.03136945
0.03240466 0.03182459 0.03181601 0.04167891]
mean value: 0.03491511344909668
key: score_time
value: [0.01279211 0.01659989 0.01733136 0.0157187 0.01501036 0.01518536
0.01507902 0.01516318 0.0155859 0.01786971]
mean value: 0.015633559226989745
key: test_mcc
value: [-0.06429107 0.14858083 0.24743583 0.11547005 -0.13483997 0.26997462
0.43759497 0.2377336 0.27468517 0.26587302]
mean value: 0.1798217053940527
key: train_mcc
value: [0.34767983 0.45701651 0.45832971 0.36717254 0.37059129 0.44575314
0.44259167 0.57806698 0.51733606 0.33589566]
mean value: 0.4320433400080996
key: test_accuracy
value: [0.46428571 0.53571429 0.60714286 0.53571429 0.48214286 0.58928571
0.66071429 0.58181818 0.61818182 0.58181818]
mean value: 0.5656818181818182
key: train_accuracy
value: [0.60479042 0.67065868 0.67065868 0.61477046 0.61676647 0.66267465
0.66067864 0.74900398 0.70916335 0.59760956]
mean value: 0.6556774896422295
key: test_fscore
value: [0.59459459 0.65789474 0.68571429 0.66666667 0.65060241 0.7012987
0.74666667 0.68493151 0.68656716 0.69333333]
mean value: 0.6768270065783327
key: train_fscore
value: [0.71469741 0.75037821 0.74962064 0.71906841 0.72011662 0.74509804
0.7439759 0.79742765 0.77258567 0.71060172]
mean value: 0.7423570274505628
key: test_precision
value: [0.46808511 0.51020408 0.57142857 0.52 0.49090909 0.55102041
0.59574468 0.54347826 0.575 0.54166667]
mean value: 0.5367536866903855
key: train_precision
value: [0.55605381 0.60048426 0.59951456 0.56136364 0.56264237 0.59375
0.59232614 0.6631016 0.62944162 0.55111111]
mean value: 0.5909789120494734
key: test_recall
value: [0.81481481 0.92592593 0.85714286 0.92857143 0.96428571 0.96428571
1. 0.92592593 0.85185185 0.96296296]
mean value: 0.9195767195767196
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47637292 0.54916986 0.60714286 0.53571429 0.48214286 0.58928571
0.66071429 0.58796296 0.6223545 0.58862434]
mean value: 0.5699484583105273
key: train_roc_auc
value: [0.60869565 0.67391304 0.67519685 0.62007874 0.62204724 0.66732283
0.66535433 0.7519685 0.71259843 0.6023622 ]
mean value: 0.6599537829510441
key: test_jcc
value: [0.42307692 0.49019608 0.52173913 0.5 0.48214286 0.54
0.59574468 0.52083333 0.52272727 0.53061224]
mean value: 0.5127072520895565
key: train_jcc
value: [0.55605381 0.60048426 0.59951456 0.56136364 0.56264237 0.59375
0.59232614 0.6631016 0.62944162 0.55111111]
mean value: 0.5909789120494734
MCC on Blind test: -0.06
Accuracy on Blind test: 0.18
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03044271 0.01693082 0.0437572 0.02841759 0.04030895 0.03944254
0.0393815 0.03932357 0.03971505 0.03838658]
mean value: 0.03561065196990967
key: score_time
value: [0.02965569 0.01676869 0.03821087 0.01900721 0.0193789 0.01906633
0.02097154 0.01901102 0.01889944 0.01913977]
mean value: 0.022010946273803712
key: test_mcc
value: [0.96490128 0.85696041 0.89342711 0.71428571 0.67900461 0.85933785
0.71611487 0.71049701 0.79069197 0.8565805 ]
mean value: 0.8041801324203861
key: train_mcc
value: [0.86064046 0.86046475 0.86832207 0.84841579 0.88050876 0.86087775
0.85640062 0.86465808 0.86465808 0.86501334]
mean value: 0.8629959687869436
key: test_accuracy
value: [0.98214286 0.92857143 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85454545 0.89090909 0.92727273]
mean value: 0.9012012987012987
key: train_accuracy
value: [0.93013972 0.93013972 0.93413174 0.9241517 0.94011976 0.93013972
0.92814371 0.93227092 0.93227092 0.93227092]
mean value: 0.9313778816868256
key: test_fscore
value: [0.98181818 0.92592593 0.94736842 0.85714286 0.84210526 0.92592593
0.86206897 0.84615385 0.89655172 0.92307692]
mean value: 0.9008138033909359
key: train_fscore
value: [0.9304175 0.93013972 0.93360161 0.92369478 0.94 0.9304175
0.92771084 0.932 0.932 0.93253968]
mean value: 0.9312521625306114
key: test_precision
value: [0.96428571 0.92592593 0.93103448 0.85714286 0.82758621 0.96153846
0.83333333 0.88 0.83870968 0.96 ]
mean value: 0.8979556659300819
key: train_precision
value: [0.91764706 0.92094862 0.928 0.91633466 0.92885375 0.9140625
0.92031873 0.92460317 0.92460317 0.91796875]
mean value: 0.9213340416025564
key: test_recall
value: [1. 0.92592593 0.96428571 0.85714286 0.85714286 0.89285714
0.89285714 0.81481481 0.96296296 0.88888889]
mean value: 0.9056878306878307
key: train_recall
value: [0.94354839 0.93951613 0.93927126 0.93117409 0.951417 0.94736842
0.93522267 0.93951613 0.93951613 0.94758065]
mean value: 0.9414130860650386
key: test_roc_auc
value: [0.98275862 0.9284802 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85383598 0.89219577 0.9265873 ]
mean value: 0.9012429301222404
key: train_roc_auc
value: [0.93027222 0.93023237 0.93420256 0.92424846 0.94027543 0.93037712
0.92824126 0.93235649 0.93235649 0.93245174]
mean value: 0.9315014140293759
key: test_jcc
value: [0.96428571 0.86206897 0.9 0.75 0.72727273 0.86206897
0.75757576 0.73333333 0.8125 0.85714286]
mean value: 0.8226248320644872
key: train_jcc
value: [0.86988848 0.86940299 0.8754717 0.85820896 0.88679245 0.86988848
0.86516854 0.87265918 0.87265918 0.87360595]
mean value: 0.8713745882255924
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.2774384 0.29088688 0.29256916 0.34308195 0.30772591 0.31144476
0.29420948 0.30333471 0.28965688 0.29762864]
mean value: 0.3007976770401001
key: score_time
value: [0.02038169 0.01897693 0.01917744 0.02136087 0.01924849 0.01914406
0.01901841 0.01909137 0.0192101 0.01927757]
mean value: 0.01948869228363037
key: test_mcc
value: [0.96490128 0.85696041 0.89342711 0.71428571 0.67900461 0.85933785
0.71611487 0.71049701 0.79069197 0.8565805 ]
mean value: 0.8041801324203861
key: train_mcc
value: [0.86064046 0.86046475 0.86832207 0.84841579 0.88050876 0.86087775
0.85640062 0.86465808 0.86465808 0.89643787]
mean value: 0.8661384215049459
key: test_accuracy
value: [0.98214286 0.92857143 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85454545 0.89090909 0.92727273]
mean value: 0.9012012987012987
key: train_accuracy
value: [0.93013972 0.93013972 0.93413174 0.9241517 0.94011976 0.93013972
0.92814371 0.93227092 0.93227092 0.94820717]
mean value: 0.9329715071848336
key: test_fscore
value: [0.98181818 0.92592593 0.94736842 0.85714286 0.84210526 0.92592593
0.86206897 0.84615385 0.89655172 0.92307692]
mean value: 0.9008138033909359
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.9304175 0.93013972 0.93360161 0.92369478 0.94 0.9304175
0.92771084 0.932 0.932 0.94779116]
mean value: 0.9327773107425066
key: test_precision
value: [0.96428571 0.92592593 0.93103448 0.85714286 0.82758621 0.96153846
0.83333333 0.88 0.83870968 0.96 ]
mean value: 0.8979556659300819
key: train_precision
value: [0.91764706 0.92094862 0.928 0.91633466 0.92885375 0.9140625
0.92031873 0.92460317 0.92460317 0.944 ]
mean value: 0.9239371666025564
key: test_recall
value: [1. 0.92592593 0.96428571 0.85714286 0.85714286 0.89285714
0.89285714 0.81481481 0.96296296 0.88888889]
mean value: 0.9056878306878307
key: train_recall
value: [0.94354839 0.93951613 0.93927126 0.93117409 0.951417 0.94736842
0.93522267 0.93951613 0.93951613 0.9516129 ]
mean value: 0.9418163118714902
key: test_roc_auc
value: [0.98275862 0.9284802 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85383598 0.89219577 0.9265873 ]
mean value: 0.9012429301222404
key: train_roc_auc
value: [0.93027222 0.93023237 0.93420256 0.92424846 0.94027543 0.93037712
0.92824126 0.93235649 0.93235649 0.9482474 ]
mean value: 0.9330809796885072
key: test_jcc
value: [0.96428571 0.86206897 0.9 0.75 0.72727273 0.86206897
0.75757576 0.73333333 0.8125 0.85714286]
mean value: 0.8226248320644872
key: train_jcc
value: [0.86988848 0.86940299 0.8754717 0.85820896 0.88679245 0.86988848
0.86516854 0.87265918 0.87265918 0.90076336]
mean value: 0.874090329307916
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0369215 0.03516078 0.03686929 0.04687595 0.03705096 0.06197238
0.03686905 0.03668141 0.03720927 0.03774619]
mean value: 0.040335679054260255
key: score_time
value: [0.01524997 0.01199031 0.01500368 0.01204038 0.01474452 0.01514792
0.0146656 0.01476455 0.01470566 0.014925 ]
mean value: 0.014323759078979491
key: test_mcc
value: [0.92980296 0.86189955 0.79161589 0.79110556 0.71611487 0.8660254
0.68250015 0.78571429 0.75047877 0.85933785]
mean value: 0.8034595290181991
key: train_mcc
value: [0.8383186 0.86982976 0.86199599 0.87376677 0.87412415 0.84662074
0.8543903 0.86221141 0.87444958 0.86237183]
mean value: 0.8618079125330999
key: test_accuracy
value: [0.96491228 0.92982456 0.89473684 0.89473684 0.85714286 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.9005639097744361
key: train_accuracy
value: [0.91913215 0.93491124 0.93096647 0.93688363 0.93700787 0.92322835
0.92716535 0.93110236 0.93700787 0.93110236]
mean value: 0.9308507664352607
key: test_fscore
value: [0.96428571 0.93103448 0.89285714 0.9 0.86206897 0.92307692
0.84745763 0.89285714 0.87719298 0.92592593]
mean value: 0.9016756906853496
key: train_fscore
value: [0.91976517 0.93491124 0.93123772 0.93675889 0.9375 0.92397661
0.92759295 0.93096647 0.9379845 0.93177388]
mean value: 0.9312467431117992
key: test_precision
value: [0.96428571 0.9 0.92592593 0.87096774 0.83333333 1.
0.80645161 0.89285714 0.86206897 0.96153846]
mean value: 0.9017428898296529
key: train_precision
value: [0.91439689 0.93675889 0.92578125 0.93675889 0.93023256 0.91505792
0.92217899 0.93280632 0.92366412 0.92277992]
mean value: 0.9260415754273096
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.85714286
0.89285714 0.89285714 0.89285714 0.89285714]
mean value: 0.9043103448275862
key: train_recall
value: [0.92519685 0.93307087 0.93675889 0.93675889 0.94488189 0.93307087
0.93307087 0.92913386 0.95275591 0.94094488]
mean value: 0.9365643770813233
key: test_roc_auc
value: [0.96490148 0.93041872 0.8953202 0.89408867 0.85714286 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.9006157635467981
key: train_roc_auc
value: [0.91912016 0.93491488 0.93097787 0.93688338 0.93700787 0.92322835
0.92716535 0.93110236 0.93700787 0.93110236]
mean value: 0.9308510472752172
key: test_jcc
value: [0.93103448 0.87096774 0.80645161 0.81818182 0.75757576 0.85714286
0.73529412 0.80645161 0.78125 0.86206897]
mean value: 0.8226418966565289
key: train_jcc
value: [0.85144928 0.87777778 0.87132353 0.88104089 0.88235294 0.85869565
0.8649635 0.87084871 0.88321168 0.87226277]
mean value: 0.8713926732787018
MCC on Blind test: 0.29
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.86660838 1.03593755 0.85590649 1.07513618 0.93153334 1.0510838
1.23593092 1.05089116 1.02004242 0.89820933]
mean value: 1.0021279573440551
key: score_time
value: [0.01472855 0.01553202 0.01536512 0.02065921 0.01555824 0.01580501
0.01234865 0.01632595 0.0153687 0.01226258]
mean value: 0.015395402908325195
key: test_mcc
value: [0.92980296 0.82512315 0.85960591 0.75462449 0.71611487 0.8660254
0.64450339 0.78571429 0.75047877 0.85933785]
mean value: 0.7991331087863264
key: train_mcc
value: [0.89349849 0.89349683 0.94488889 0.94488889 0.90158179 0.8819171
0.8307151 0.88976378 0.90191737 0.81889764]
mean value: 0.8901565879858516
key: test_accuracy
value: [0.96491228 0.9122807 0.92982456 0.87719298 0.85714286 0.92857143
0.82142857 0.89285714 0.875 0.92857143]
mean value: 0.8987781954887218
key: train_accuracy
value: [0.94674556 0.94674556 0.97238659 0.97238659 0.9507874 0.94094488
0.91535433 0.94488189 0.9507874 0.90944882]
mean value: 0.945046902421221
key: test_fscore
value: [0.96428571 0.9122807 0.93103448 0.88135593 0.86206897 0.92307692
0.82758621 0.89285714 0.87719298 0.92592593]
mean value: 0.8997664977732036
key: train_fscore
value: [0.94674556 0.94695481 0.97211155 0.97211155 0.95088409 0.94117647
0.91552063 0.94488189 0.95145631 0.90944882]
mean value: 0.9451291688116392
key: test_precision
value: [0.96428571 0.89655172 0.93103448 0.86666667 0.83333333 1.
0.8 0.89285714 0.86206897 0.96153846]
mean value: 0.9008336491095112
key: train_precision
value: [0.9486166 0.94509804 0.97991968 0.97991968 0.94901961 0.9375
0.91372549 0.94488189 0.93869732 0.90944882]
mean value: 0.9446827122144215
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
0.85714286 0.89285714 0.89285714 0.89285714]
mean value: 0.9006157635467981
key: train_recall
value: [0.94488189 0.9488189 0.96442688 0.96442688 0.95275591 0.94488189
0.91732283 0.94488189 0.96456693 0.90944882]
mean value: 0.9456412810058822
key: test_roc_auc
value: [0.96490148 0.91256158 0.92980296 0.87684729 0.85714286 0.92857143
0.82142857 0.89285714 0.875 0.92857143]
mean value: 0.898768472906404
key: train_roc_auc
value: [0.94674925 0.94674146 0.97237092 0.97237092 0.9507874 0.94094488
0.91535433 0.94488189 0.9507874 0.90944882]
mean value: 0.9450437272416047
key: test_jcc
value: [0.93103448 0.83870968 0.87096774 0.78787879 0.75757576 0.85714286
0.70588235 0.80645161 0.78125 0.86206897]
mean value: 0.8198962236072506
key: train_jcc
value: [0.8988764 0.89925373 0.94573643 0.94573643 0.90636704 0.88888889
0.8442029 0.89552239 0.90740741 0.83393502]
mean value: 0.8965926646210486
MCC on Blind test: 0.28
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01588368 0.01141286 0.01038861 0.0103476 0.01009512 0.01025605
0.01078176 0.01141858 0.0107193 0.01143622]
mean value: 0.011273980140686035
key: score_time
value: [0.01116276 0.00949693 0.0092299 0.00901246 0.00896025 0.00897455
0.00905919 0.00940347 0.00912666 0.00908256]
mean value: 0.009350872039794922
key: test_mcc
value: [0.8615634 0.55091314 0.7589669 0.59358067 0.46697379 0.4472136
0.79385662 0.50128041 0.54446551 0.72168784]
mean value: 0.6240501864920316
key: train_mcc
value: [0.65683536 0.65767923 0.64505247 0.67103176 0.64226244 0.66894588
0.66816241 0.70393683 0.66020809 0.66621692]
mean value: 0.6640331375671215
key: test_accuracy
value: [0.92982456 0.77192982 0.87719298 0.78947368 0.73214286 0.71428571
0.89285714 0.75 0.76785714 0.85714286]
mean value: 0.8082706766917294
key: train_accuracy
value: [0.82445759 0.82445759 0.81854043 0.83234714 0.80905512 0.83070866
0.83070866 0.8503937 0.82677165 0.82874016]
mean value: 0.8276180714097128
key: test_fscore
value: [0.92592593 0.74509804 0.87272727 0.76923077 0.71698113 0.66666667
0.88461538 0.74074074 0.74509804 0.84615385]
mean value: 0.7913237816567451
key: train_fscore
value: [0.81023454 0.80942184 0.80257511 0.81953291 0.77904328 0.81702128
0.81779661 0.84297521 0.81355932 0.8137045 ]
mean value: 0.8125864591501547
key: test_precision
value: [0.96153846 0.82608696 0.92307692 0.86956522 0.76 0.8
0.95833333 0.76923077 0.82608696 0.91666667]
mean value: 0.8610585284280936
key: train_precision
value: [0.88372093 0.88732394 0.87793427 0.8853211 0.92432432 0.88888889
0.8853211 0.88695652 0.88073394 0.89201878]
mean value: 0.8892543807279057
key: test_recall
value: [0.89285714 0.67857143 0.82758621 0.68965517 0.67857143 0.57142857
0.82142857 0.71428571 0.67857143 0.78571429]
mean value: 0.7338669950738916
key: train_recall
value: [0.7480315 0.74409449 0.73913043 0.76284585 0.67322835 0.75590551
0.75984252 0.80314961 0.75590551 0.7480315 ]
mean value: 0.7490165260962933
key: test_roc_auc
value: [0.92918719 0.7703202 0.87807882 0.79125616 0.73214286 0.71428571
0.89285714 0.75 0.76785714 0.85714286]
mean value: 0.8083128078817734
key: train_roc_auc
value: [0.82460863 0.82461641 0.81838412 0.83221033 0.80905512 0.83070866
0.83070866 0.8503937 0.82677165 0.82874016]
mean value: 0.8276197441722947
key: test_jcc
value: [0.86206897 0.59375 0.77419355 0.625 0.55882353 0.5
0.79310345 0.58823529 0.59375 0.73333333]
mean value: 0.6622258119042945
key: train_jcc
value: [0.68100358 0.67985612 0.6702509 0.6942446 0.6380597 0.69064748
0.69175627 0.72857143 0.68571429 0.68592058]
mean value: 0.6846024947522601
MCC on Blind test: 0.33
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01078367 0.01111078 0.010741 0.01050091 0.01050091 0.01088929
0.01170611 0.01178074 0.01180744 0.01147151]
mean value: 0.011129236221313477
key: score_time
value: [0.00919318 0.00924015 0.00906968 0.00900912 0.00903487 0.00986242
0.00924635 0.00929976 0.00924873 0.00980163]
mean value: 0.009300589561462402
key: test_mcc
value: [0.8953202 0.7589669 0.71921182 0.59358067 0.60753044 0.75047877
0.67900461 0.67900461 0.53881591 0.75047877]
mean value: 0.6972392691166773
key: train_mcc
value: [0.75161488 0.72796178 0.73570284 0.72807243 0.74414639 0.74805469
0.75599926 0.75197433 0.74812427 0.76055607]
mean value: 0.7452206943433384
key: test_accuracy
value: [0.94736842 0.87719298 0.85964912 0.78947368 0.80357143 0.875
0.83928571 0.83928571 0.76785714 0.875 ]
mean value: 0.8473684210526315
key: train_accuracy
value: [0.87573964 0.86390533 0.8678501 0.86390533 0.87204724 0.87401575
0.87795276 0.87598425 0.87401575 0.87992126]
mean value: 0.8725337402351333
key: test_fscore
value: [0.94736842 0.88135593 0.86206897 0.76923077 0.8 0.87719298
0.84210526 0.83636364 0.77966102 0.87719298]
mean value: 0.8472539969386996
key: train_fscore
value: [0.87719298 0.86282306 0.86732673 0.86172345 0.87128713 0.8745098
0.87698413 0.8762279 0.87301587 0.88246628]
mean value: 0.8723557335436966
key: test_precision
value: [0.93103448 0.83870968 0.86206897 0.86956522 0.81481481 0.86206897
0.82758621 0.85185185 0.74193548 0.86206897]
mean value: 0.846170463155519
key: train_precision
value: [0.86872587 0.87148594 0.86904762 0.87398374 0.87649402 0.87109375
0.884 0.8745098 0.88 0.86415094]
mean value: 0.8733491692608164
key: test_recall
value: [0.96428571 0.92857143 0.86206897 0.68965517 0.78571429 0.89285714
0.85714286 0.82142857 0.82142857 0.89285714]
mean value: 0.8516009852216748
key: train_recall
value: [0.88582677 0.85433071 0.86561265 0.84980237 0.86614173 0.87795276
0.87007874 0.87795276 0.86614173 0.9015748 ]
mean value: 0.8715415019762845
key: test_roc_auc
value: [0.9476601 0.87807882 0.85960591 0.79125616 0.80357143 0.875
0.83928571 0.83928571 0.76785714 0.875 ]
mean value: 0.8476600985221675
key: train_roc_auc
value: [0.87571971 0.86392425 0.86784569 0.86387756 0.87204724 0.87401575
0.87795276 0.87598425 0.87401575 0.87992126]
mean value: 0.8725304223335719
key: test_jcc
value: [0.9 0.78787879 0.75757576 0.625 0.66666667 0.78125
0.72727273 0.71875 0.63888889 0.78125 ]
mean value: 0.7384532828282828
key: train_jcc
value: [0.78125 0.75874126 0.76573427 0.75704225 0.77192982 0.77700348
0.78091873 0.77972028 0.77464789 0.78965517]
mean value: 0.7736643154251823
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00963664 0.01039338 0.01091814 0.00967956 0.00986576 0.00981426
0.01098919 0.01109385 0.011096 0.01093125]
mean value: 0.010441803932189941
key: score_time
value: [0.01654792 0.01324368 0.0126996 0.01283431 0.01158905 0.01262879
0.01270723 0.01319766 0.01274705 0.0123148 ]
mean value: 0.013051009178161621
key: test_mcc
value: [0.69397486 0.40447771 0.37345948 0.54592083 0.42966892 0.61065803
0.5728919 0.47187011 0.57142857 0.71611487]
mean value: 0.5390465286842666
key: train_mcc
value: [0.68850776 0.72786691 0.66880967 0.70027393 0.7170914 0.69728825
0.71286919 0.68506061 0.69728825 0.68988054]
mean value: 0.6984936522715344
key: test_accuracy
value: [0.84210526 0.70175439 0.68421053 0.77192982 0.71428571 0.80357143
0.78571429 0.73214286 0.78571429 0.85714286]
mean value: 0.7678571428571428
key: train_accuracy
value: [0.84418146 0.86390533 0.83431953 0.85009862 0.85826772 0.8484252
0.85629921 0.84251969 0.8484252 0.84448819]
mean value: 0.8490930127816864
key: test_fscore
value: [0.82352941 0.67924528 0.66666667 0.78688525 0.7037037 0.79245283
0.79310345 0.70588235 0.78571429 0.85185185]
mean value: 0.7589035080027439
key: train_fscore
value: [0.84294235 0.86336634 0.832 0.84860558 0.85542169 0.84569138
0.85429142 0.84189723 0.84569138 0.84040404]
mean value: 0.84703114032967
key: test_precision
value: [0.91304348 0.72 0.72 0.75 0.73076923 0.84
0.76666667 0.7826087 0.78571429 0.88461538]
mean value: 0.7893417741678611
key: train_precision
value: [0.85140562 0.8685259 0.84210526 0.85542169 0.87295082 0.86122449
0.86639676 0.8452381 0.86122449 0.86307054]
mean value: 0.8587563663863939
key: test_recall
value: [0.75 0.64285714 0.62068966 0.82758621 0.67857143 0.75
0.82142857 0.64285714 0.78571429 0.82142857]
mean value: 0.7341133004926108
key: train_recall
value: [0.83464567 0.85826772 0.82213439 0.84189723 0.83858268 0.83070866
0.84251969 0.83858268 0.83070866 0.81889764]
mean value: 0.8356945006380131
key: test_roc_auc
value: [0.84051724 0.70073892 0.68534483 0.77093596 0.71428571 0.80357143
0.78571429 0.73214286 0.78571429 0.85714286]
mean value: 0.7676108374384236
key: train_roc_auc
value: [0.84420031 0.86391647 0.83429554 0.85008247 0.85826772 0.8484252
0.85629921 0.84251969 0.8484252 0.84448819]
mean value: 0.8490919983816252
key: test_jcc
value: [0.7 0.51428571 0.5 0.64864865 0.54285714 0.65625
0.65714286 0.54545455 0.64705882 0.74193548]
mean value: 0.6153633215789288
key: train_jcc
value: [0.72852234 0.75958188 0.71232877 0.73702422 0.74736842 0.73263889
0.7456446 0.72696246 0.73263889 0.72473868]
mean value: 0.7347449138309052
MCC on Blind test: 0.25
Accuracy on Blind test: 0.68
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0228858 0.02311659 0.02236342 0.02557158 0.02441049 0.02272844
0.02228618 0.02230501 0.02276373 0.02649665]
mean value: 0.02349278926849365
key: score_time
value: [0.01345658 0.01215935 0.01206708 0.01320601 0.01286006 0.01210189
0.0122695 0.0118916 0.01202822 0.01279545]
mean value: 0.012483572959899903
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.75462449 0.75047877 0.78772636
0.67900461 0.71611487 0.67900461 0.85933785]
mean value: 0.7876537869681253
key: train_mcc
value: [0.78304104 0.79093074 0.79093399 0.80276057 0.80317451 0.79926835
0.80709287 0.80714291 0.81104876 0.79139378]
mean value: 0.7986787522112317
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.875 0.89285714
0.83928571 0.85714286 0.83928571 0.92857143]
mean value: 0.8933897243107769
key: train_accuracy
value: [0.89151874 0.89546351 0.89546351 0.90138067 0.9015748 0.8996063
0.90354331 0.90354331 0.90551181 0.89566929]
mean value: 0.8993275248877913
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.87719298 0.88888889
0.84210526 0.85185185 0.84210526 0.92592593]
mean value: 0.893519743250587
key: train_fscore
value: [0.89194499 0.89587426 0.89546351 0.90118577 0.90196078 0.90019569
0.90373281 0.90410959 0.90588235 0.8962818 ]
mean value: 0.8996631565871114
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.86206897 0.92307692
0.82758621 0.88461538 0.82758621 0.96153846]
mean value: 0.8946242263483642
key: train_precision
value: [0.89019608 0.89411765 0.89370079 0.90118577 0.8984375 0.89494163
0.90196078 0.89883268 0.90234375 0.89105058]
mean value: 0.896676722068022
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.85714286 0.82142857 0.85714286 0.89285714]
mean value: 0.8934729064039408
key: train_recall
value: [0.89370079 0.8976378 0.8972332 0.90118577 0.90551181 0.90551181
0.90551181 0.90944882 0.90944882 0.9015748 ]
mean value: 0.9026765429024929
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.87684729 0.875 0.89285714
0.83928571 0.85714286 0.83928571 0.92857143]
mean value: 0.8934113300492612
key: train_roc_auc
value: [0.89151443 0.89545921 0.89546699 0.90138029 0.9015748 0.8996063
0.90354331 0.90354331 0.90551181 0.89566929]
mean value: 0.8993269739503906
key: test_jcc
value: [0.9 0.9 0.87096774 0.78787879 0.78125 0.8
0.72727273 0.74193548 0.72727273 0.86206897]
mean value: 0.8098646433747936
key: train_jcc
value: [0.80496454 0.8113879 0.81071429 0.82014388 0.82142857 0.81850534
0.82437276 0.825 0.82795699 0.81205674]
mean value: 0.8176531006168795
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.75838876 1.88278365 1.92518592 1.9501245 1.97273588 2.06173635
2.07129908 1.80369496 1.98173571 1.86790895]
mean value: 1.9275593757629395
key: score_time
value: [0.01259828 0.02442145 0.01467299 0.01875997 0.01800513 0.01487517
0.01815009 0.01271486 0.0190661 0.0148983 ]
mean value: 0.016816234588623045
key: test_mcc
value: [0.82512315 0.68434084 0.82490815 0.7257422 0.71611487 0.85933785
0.71611487 0.75047877 0.71611487 0.78571429]
mean value: 0.7603989877183843
key: train_mcc
value: [0.98823511 0.99214142 0.99211042 0.99211042 0.98038334 0.98428248
0.98428248 0.98050495 0.99607071 0.99215674]
mean value: 0.9882278095502359
key: test_accuracy
value: [0.9122807 0.84210526 0.9122807 0.85964912 0.85714286 0.92857143
0.85714286 0.875 0.85714286 0.89285714]
mean value: 0.8794172932330827
key: train_accuracy
value: [0.99408284 0.99605523 0.99605523 0.99605523 0.99015748 0.99212598
0.99212598 0.99015748 0.9980315 0.99606299]
mean value: 0.9940909938032894
key: test_fscore
value: [0.9122807 0.83636364 0.91525424 0.87096774 0.86206897 0.92592593
0.86206897 0.87272727 0.86206897 0.89285714]
mean value: 0.8812583555403708
key: train_fscore
value: [0.99405941 0.99604743 0.99604743 0.99604743 0.99009901 0.99209486
0.99209486 0.99005964 0.99802761 0.99607843]
mean value: 0.9940656118583756
key: test_precision
value: [0.89655172 0.85185185 0.9 0.81818182 0.83333333 0.96153846
0.83333333 0.88888889 0.83333333 0.89285714]
mean value: 0.8709869887456094
key: train_precision
value: [1. 1. 0.99604743 0.99604743 0.99601594 0.99603175
0.99603175 1. 1. 0.9921875 ]
mean value: 0.9972361789978551
key: test_recall
value: [0.92857143 0.82142857 0.93103448 0.93103448 0.89285714 0.89285714
0.89285714 0.85714286 0.89285714 0.89285714]
mean value: 0.8933497536945813
key: train_recall
value: [0.98818898 0.99212598 0.99604743 0.99604743 0.98425197 0.98818898
0.98818898 0.98031496 0.99606299 1. ]
mean value: 0.9909417696305749
key: test_roc_auc
value: [0.91256158 0.84174877 0.91194581 0.85837438 0.85714286 0.92857143
0.85714286 0.875 0.85714286 0.89285714]
mean value: 0.8792487684729065
key: train_roc_auc
value: [0.99409449 0.99606299 0.99605521 0.99605521 0.99015748 0.99212598
0.99212598 0.99015748 0.9980315 0.99606299]
mean value: 0.9940929320593819
key: test_jcc
value: [0.83870968 0.71875 0.84375 0.77142857 0.75757576 0.86206897
0.75757576 0.77419355 0.75757576 0.80645161]
mean value: 0.7888079648382763
key: train_jcc
value: [0.98818898 0.99212598 0.99212598 0.99212598 0.98039216 0.98431373
0.98431373 0.98031496 0.99606299 0.9921875 ]
mean value: 0.9882151989732901
MCC on Blind test: 0.23
Accuracy on Blind test: 0.66
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03890681 0.02280307 0.02086091 0.0209384 0.01961708 0.02346158
0.02328086 0.02160168 0.02298188 0.0219512 ]
mean value: 0.02364034652709961
key: score_time
value: [0.01179671 0.00914145 0.0088346 0.00926948 0.00878859 0.00875878
0.00874162 0.00876069 0.00872087 0.00878668]
mean value: 0.00915994644165039
key: test_mcc
value: [0.96547546 0.82512315 0.8953202 0.92980296 0.85714286 0.92857143
0.89342711 0.85933785 0.93094934 0.85933785]
mean value: 0.894448819396392
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.9122807 0.94736842 0.96491228 0.92857143 0.96428571
0.94642857 0.92857143 0.96428571 0.92857143]
mean value: 0.9467731829573934
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.9122807 0.94736842 0.96551724 0.92857143 0.96428571
0.94736842 0.93103448 0.96296296 0.93103448]
mean value: 0.9472242038394488
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89655172 0.96428571 0.96551724 0.92857143 0.96428571
0.93103448 0.9 1. 0.9 ]
mean value: 0.9450246305418719
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.96551724 0.92857143 0.96428571
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9503694581280788
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.91256158 0.9476601 0.96490148 0.92857143 0.96428571
0.94642857 0.92857143 0.96428571 0.92857143]
mean value: 0.9467980295566503
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.83870968 0.9 0.93333333 0.86666667 0.93103448
0.9 0.87096774 0.92857143 0.87096774]
mean value: 0.9004536786906087
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.47
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12795711 0.12534237 0.12668014 0.12615395 0.12666821 0.12553144
0.12423468 0.12471223 0.12423205 0.12511349]
mean value: 0.12566256523132324
key: score_time
value: [0.01779485 0.01778674 0.01880646 0.01766038 0.01766086 0.01774502
0.0188868 0.01795578 0.01807523 0.01787972]
mean value: 0.01802518367767334
key: test_mcc
value: [0.92980296 0.64901478 0.82512315 0.75462449 0.78772636 0.85933785
0.85933785 0.71428571 0.71611487 0.82195294]
mean value: 0.791732097228972
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.8245614 0.9122807 0.87719298 0.89285714 0.92857143
0.92857143 0.85714286 0.85714286 0.91071429]
mean value: 0.8953947368421052
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.82142857 0.9122807 0.88135593 0.89655172 0.92592593
0.93103448 0.85714286 0.86206897 0.90909091]
mean value: 0.8961165784245547
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.82142857 0.92857143 0.86666667 0.86666667 0.96153846
0.9 0.85714286 0.83333333 0.92592593]
mean value: 0.8925559625559626
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.82142857 0.89655172 0.89655172 0.92857143 0.89285714
0.96428571 0.85714286 0.89285714 0.89285714]
mean value: 0.9007389162561577
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.82450739 0.91256158 0.87684729 0.89285714 0.92857143
0.92857143 0.85714286 0.85714286 0.91071429]
mean value: 0.8953817733990148
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.6969697 0.83870968 0.78787879 0.8125 0.86206897
0.87096774 0.75 0.75757576 0.83333333]
mean value: 0.8141038443388277
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.72
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0101614 0.01020575 0.01025558 0.01031733 0.0105567 0.01035643
0.01047182 0.01056862 0.01091957 0.01059294]
mean value: 0.010440611839294433
key: score_time
value: [0.00886583 0.00881696 0.00859666 0.00871801 0.00875425 0.0088892
0.00892472 0.00889111 0.00875926 0.00875688]
mean value: 0.008797287940979004
key: test_mcc
value: [0.68736396 0.47413793 0.57973205 0.62473685 0.28644595 0.53605627
0.32163376 0.36084392 0.58501794 0.47187011]
mean value: 0.49278387214833563
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.73684211 0.78947368 0.80701754 0.64285714 0.76785714
0.66071429 0.67857143 0.78571429 0.73214286]
mean value: 0.7443295739348371
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83018868 0.73684211 0.8 0.79245283 0.65517241 0.77192982
0.65454545 0.7 0.80645161 0.70588235]
mean value: 0.7453465273441484
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88 0.72413793 0.77419355 0.875 0.63333333 0.75862069
0.66666667 0.65625 0.73529412 0.7826087 ]
mean value: 0.7486104982375985
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.78571429 0.75 0.82758621 0.72413793 0.67857143 0.78571429
0.64285714 0.75 0.89285714 0.64285714]
mean value: 0.7480295566502463
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.841133 0.73706897 0.7887931 0.80849754 0.64285714 0.76785714
0.66071429 0.67857143 0.78571429 0.73214286]
mean value: 0.7443349753694581
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.70967742 0.58333333 0.66666667 0.65625 0.48717949 0.62857143
0.48648649 0.53846154 0.67567568 0.54545455]
mean value: 0.5977756581184
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.61
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.86948252 1.84095502 1.86450291 1.85053563 1.83855367 1.95213008
1.8649447 1.85131979 1.8796494 1.86519742]
mean value: 1.8677271127700805
key: score_time
value: [0.093261 0.09254313 0.0949862 0.09473634 0.09973621 0.10174298
0.09618306 0.09803486 0.09973669 0.09787321]
mean value: 0.09688336849212646
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.85960591 0.85714286 1.
0.96490128 0.89342711 0.93094934 0.89342711]
mean value: 0.9190052220037386
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.92982456 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.94642857]
mean value: 0.9592418546365915
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.93103448 0.92857143 1.
0.98245614 0.94736842 0.96296296 0.94736842]
mean value: 0.9594465700999276
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96551724 0.93103448 0.92857143 1.
0.96551724 0.93103448 1. 0.93103448]
mean value: 0.9583743842364532
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.92857143 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.9610837438423645
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.92980296 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.94642857]
mean value: 0.9592364532019705
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.87096774 0.86666667 1.
0.96551724 0.9 0.92857143 0.9 ]
mean value: 0.9229342126171938
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.00257015 0.97476339 0.95755458 1.02234483 0.98715186 0.97197175
0.99527335 1.01324677 0.94979787 1.04136539]
mean value: 0.9916039943695069
key: score_time
value: [0.21403074 0.20837617 0.2156167 0.25913739 0.2587049 0.23948455
0.25020814 0.22535872 0.18812346 0.26523614]
mean value: 0.23242769241333008
key: test_mcc
value: [0.96547546 0.8953202 0.8953202 0.85960591 0.85714286 0.93094934
0.92857143 0.82195294 0.93094934 0.89342711]
mean value: 0.8978714775896186
key: train_mcc
value: [0.95266254 0.96450413 0.94872473 0.96055211 0.9606597 0.94882625
0.94882625 0.95670033 0.94882625 0.95675965]
mean value: 0.9547041949463222
key: test_accuracy
value: [0.98245614 0.94736842 0.94736842 0.92982456 0.92857143 0.96428571
0.96428571 0.91071429 0.96428571 0.94642857]
mean value: 0.9485588972431077
key: train_accuracy
value: [0.97633136 0.98224852 0.97435897 0.98027613 0.98031496 0.97440945
0.97440945 0.97834646 0.97440945 0.97834646]
mean value: 0.9773451210610509
key: test_fscore
value: [0.98181818 0.94736842 0.94736842 0.93103448 0.92857143 0.96551724
0.96428571 0.90909091 0.96296296 0.94736842]
mean value: 0.9485386184025023
key: train_fscore
value: [0.97637795 0.98231827 0.97425743 0.98023715 0.98039216 0.97445972
0.97445972 0.97830375 0.97445972 0.97847358]
mean value: 0.9773739464231741
key: test_precision
value: [1. 0.93103448 0.96428571 0.93103448 0.92857143 0.93333333
0.96428571 0.92592593 1. 0.93103448]
mean value: 0.9509505564677978
key: train_precision
value: [0.97637795 0.98039216 0.97619048 0.98023715 0.9765625 0.97254902
0.97254902 0.98023715 0.97254902 0.97276265]
mean value: 0.9760407098847448
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1.
0.96428571 0.89285714 0.92857143 0.96428571]
mean value: 0.9469211822660099
key: train_recall
value: [0.97637795 0.98425197 0.97233202 0.98023715 0.98425197 0.97637795
0.97637795 0.97637795 0.97637795 0.98425197]
mean value: 0.9787214839251813
key: test_roc_auc
value: [0.98214286 0.9476601 0.9476601 0.92980296 0.92857143 0.96428571
0.96428571 0.91071429 0.96428571 0.94642857]
mean value: 0.9485837438423645
key: train_roc_auc
value: [0.97633127 0.98224456 0.97435498 0.98027606 0.98031496 0.97440945
0.97440945 0.97834646 0.97440945 0.97834646]
mean value: 0.9773443092340731
key: test_jcc
value: [0.96428571 0.9 0.9 0.87096774 0.86666667 0.93333333
0.93103448 0.83333333 0.92857143 0.9 ]
mean value: 0.9028192700884581
key: train_jcc
value: [0.95384615 0.96525097 0.94980695 0.96124031 0.96153846 0.95019157
0.95019157 0.95752896 0.95019157 0.95785441]
mean value: 0.9557640916822954
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02652383 0.0104022 0.01039076 0.01055098 0.01046586 0.01057124
0.01046538 0.010535 0.01041842 0.01369309]
mean value: 0.012401676177978516
key: score_time
value: [0.01325512 0.00898743 0.00907898 0.00889564 0.0088706 0.00887895
0.0090363 0.00889659 0.00893497 0.00981998]
mean value: 0.009465456008911133
key: test_mcc
value: [0.8953202 0.7589669 0.71921182 0.59358067 0.60753044 0.75047877
0.67900461 0.67900461 0.53881591 0.75047877]
mean value: 0.6972392691166773
key: train_mcc
value: [0.75161488 0.72796178 0.73570284 0.72807243 0.74414639 0.74805469
0.75599926 0.75197433 0.74812427 0.76055607]
mean value: 0.7452206943433384
key: test_accuracy
value: [0.94736842 0.87719298 0.85964912 0.78947368 0.80357143 0.875
0.83928571 0.83928571 0.76785714 0.875 ]
mean value: 0.8473684210526315
key: train_accuracy
value: [0.87573964 0.86390533 0.8678501 0.86390533 0.87204724 0.87401575
0.87795276 0.87598425 0.87401575 0.87992126]
mean value: 0.8725337402351333
key: test_fscore
value: [0.94736842 0.88135593 0.86206897 0.76923077 0.8 0.87719298
0.84210526 0.83636364 0.77966102 0.87719298]
mean value: 0.8472539969386996
key: train_fscore
value: [0.87719298 0.86282306 0.86732673 0.86172345 0.87128713 0.8745098
0.87698413 0.8762279 0.87301587 0.88246628]
mean value: 0.8723557335436966
key: test_precision
value: [0.93103448 0.83870968 0.86206897 0.86956522 0.81481481 0.86206897
0.82758621 0.85185185 0.74193548 0.86206897]
mean value: 0.846170463155519
key: train_precision
value: [0.86872587 0.87148594 0.86904762 0.87398374 0.87649402 0.87109375
0.884 0.8745098 0.88 0.86415094]
mean value: 0.8733491692608164
key: test_recall
value: [0.96428571 0.92857143 0.86206897 0.68965517 0.78571429 0.89285714
0.85714286 0.82142857 0.82142857 0.89285714]
mean value: 0.8516009852216748
key: train_recall
value: [0.88582677 0.85433071 0.86561265 0.84980237 0.86614173 0.87795276
0.87007874 0.87795276 0.86614173 0.9015748 ]
mean value: 0.8715415019762845
key: test_roc_auc
value: [0.9476601 0.87807882 0.85960591 0.79125616 0.80357143 0.875
0.83928571 0.83928571 0.76785714 0.875 ]
mean value: 0.8476600985221675
key: train_roc_auc
value: [0.87571971 0.86392425 0.86784569 0.86387756 0.87204724 0.87401575
0.87795276 0.87598425 0.87401575 0.87992126]
mean value: 0.8725304223335719
key: test_jcc
value: [0.9 0.78787879 0.75757576 0.625 0.66666667 0.78125
0.72727273 0.71875 0.63888889 0.78125 ]
mean value: 0.7384532828282828
key: train_jcc
value: [0.78125 0.75874126 0.76573427 0.75704225 0.77192982 0.77700348
0.78091873 0.77972028 0.77464789 0.78965517]
mean value: 0.7736643154251823
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.0821619 0.07773137 0.08024621 0.07795167 0.07078624 0.07733774
0.07203126 0.07329893 0.23815799 0.07500362]
mean value: 0.09247069358825684
key: score_time
value: [0.01104927 0.01100373 0.01096082 0.01094604 0.01078987 0.0109508
0.01069236 0.01092672 0.01182914 0.0112741 ]
mean value: 0.011042284965515136
key: test_mcc
value: [0.96547546 0.92980296 0.8953202 0.8951918 0.82195294 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.922559251374398
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.94736842 0.94736842 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.94736842 0.94915254 0.9122807 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9611978799935982
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96428571 0.93333333 0.89655172 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9619293924466339
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.96551724 0.92857143 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.9610837438423645
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.9476601 0.94704433 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.9 0.90322581 0.83870968 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9262378833624663
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.34
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04363132 0.07157731 0.04334879 0.08873129 0.07350183 0.06260896
0.08780813 0.07436919 0.07810593 0.07584238]
mean value: 0.06995251178741455
key: score_time
value: [0.01874518 0.0121007 0.01208496 0.01211381 0.01205897 0.01863194
0.01871586 0.0187521 0.01878953 0.01875067]
mean value: 0.016074371337890626
key: test_mcc
value: [0.82942474 0.86189955 0.75492611 0.79110556 0.75047877 0.85933785
0.67900461 0.78571429 0.75047877 0.85933785]
mean value: 0.7921708093166815
key: train_mcc
value: [0.89366043 0.88208839 0.88566582 0.90535473 0.91750062 0.89376313
0.89387399 0.90553988 0.90962508 0.88599845]
mean value: 0.8973070516162945
key: test_accuracy
value: [0.9122807 0.92982456 0.87719298 0.89473684 0.875 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.8953320802005013
key: train_accuracy
value: [0.94674556 0.9408284 0.94280079 0.95266272 0.95866142 0.94685039
0.94685039 0.95275591 0.95472441 0.94291339]
mean value: 0.9485793380856978
key: test_fscore
value: [0.91525424 0.93103448 0.87719298 0.9 0.87719298 0.92592593
0.84210526 0.89285714 0.87719298 0.92592593]
mean value: 0.8964681925282066
key: train_fscore
value: [0.94736842 0.94186047 0.94302554 0.95275591 0.95906433 0.94716243
0.94736842 0.95294118 0.95516569 0.94346979]
mean value: 0.9490182161161698
key: test_precision
value: [0.87096774 0.9 0.89285714 0.87096774 0.86206897 0.96153846
0.82758621 0.89285714 0.86206897 0.96153846]
mean value: 0.8902450830593212
key: train_precision
value: [0.93822394 0.92748092 0.9375 0.94901961 0.94980695 0.94163424
0.93822394 0.94921875 0.94594595 0.93436293]
mean value: 0.9411417221682514
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714
0.85714286 0.89285714 0.89285714 0.89285714]
mean value: 0.9043103448275862
key: train_recall
value: [0.95669291 0.95669291 0.9486166 0.95652174 0.96850394 0.95275591
0.95669291 0.95669291 0.96456693 0.95275591]
mean value: 0.957049267062961
key: test_roc_auc
value: [0.91317734 0.93041872 0.87746305 0.89408867 0.875 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.8954433497536947
key: train_roc_auc
value: [0.9467259 0.94079705 0.94281224 0.95267032 0.95866142 0.94685039
0.94685039 0.95275591 0.95472441 0.94291339]
mean value: 0.9485761414210576
key: test_jcc
value: [0.84375 0.87096774 0.78125 0.81818182 0.78125 0.86206897
0.72727273 0.80645161 0.78125 0.86206897]
mean value: 0.8134511831327738
key: train_jcc
value: [0.9 0.89010989 0.89219331 0.90977444 0.92134831 0.89962825
0.9 0.91011236 0.9141791 0.89298893]
mean value: 0.903033459606262
MCC on Blind test: 0.22
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01380754 0.01324916 0.01125026 0.01091313 0.01098466 0.01100469
0.0110321 0.01100469 0.01104236 0.01112938]
mean value: 0.011541795730590821
key: score_time
value: [0.01176262 0.00988102 0.00967598 0.00937033 0.00944376 0.00940394
0.00945687 0.00941181 0.00942016 0.00950742]
mean value: 0.00973339080810547
key: test_mcc
value: [0.8953202 0.79161589 0.85960591 0.68850906 0.5728919 0.71428571
0.71428571 0.60753044 0.57142857 0.78772636]
mean value: 0.7203199757321368
key: train_mcc
value: [0.75937568 0.72007098 0.71203374 0.70842038 0.74805469 0.76387425
0.7480315 0.73230617 0.7170914 0.76380321]
mean value: 0.7373061978922988
key: test_accuracy
value: [0.94736842 0.89473684 0.92982456 0.84210526 0.78571429 0.85714286
0.85714286 0.80357143 0.78571429 0.89285714]
mean value: 0.8596177944862156
key: train_accuracy
value: [0.87968442 0.85996055 0.85601578 0.85404339 0.87401575 0.88188976
0.87401575 0.86614173 0.85826772 0.88188976]
mean value: 0.8685924614452779
key: test_fscore
value: [0.94736842 0.89655172 0.93103448 0.83636364 0.77777778 0.85714286
0.85714286 0.8 0.78571429 0.88888889]
mean value: 0.8577984930979486
key: train_fscore
value: [0.87968442 0.85884692 0.85544554 0.85140562 0.87351779 0.88095238
0.87401575 0.86561265 0.85542169 0.88142292]
mean value: 0.8676325679094097
key: test_precision
value: [0.93103448 0.86666667 0.93103448 0.88461538 0.80769231 0.85714286
0.85714286 0.81481481 0.78571429 0.92307692]
mean value: 0.8658935062383338
key: train_precision
value: [0.88142292 0.86746988 0.85714286 0.86530612 0.87698413 0.888
0.87401575 0.86904762 0.87295082 0.88492063]
mean value: 0.8737260732667103
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.79310345 0.75 0.85714286
0.85714286 0.78571429 0.78571429 0.85714286]
mean value: 0.8509852216748768
key: train_recall
value: [0.87795276 0.8503937 0.85375494 0.83794466 0.87007874 0.87401575
0.87401575 0.86220472 0.83858268 0.87795276]
mean value: 0.8616896455136784
key: test_roc_auc
value: [0.9476601 0.8953202 0.92980296 0.8429803 0.78571429 0.85714286
0.85714286 0.80357143 0.78571429 0.89285714]
mean value: 0.8597906403940887
key: train_roc_auc
value: [0.87968784 0.85997946 0.85601133 0.8540117 0.87401575 0.88188976
0.87401575 0.86614173 0.85826772 0.88188976]
mean value: 0.8685910802651645
key: test_jcc
value: [0.9 0.8125 0.87096774 0.71875 0.63636364 0.75
0.75 0.66666667 0.64705882 0.8 ]
mean value: 0.7552306868495199
key: train_jcc
value: [0.78521127 0.75261324 0.74740484 0.74125874 0.7754386 0.78723404
0.77622378 0.7630662 0.74736842 0.78798587]
mean value: 0.7663804997708952
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01800466 0.02170086 0.01728868 0.0168364 0.01796818 0.02323866
0.02558088 0.0231514 0.02295661 0.01844239]
mean value: 0.02051687240600586
key: score_time
value: [0.01028657 0.01107836 0.014539 0.01169682 0.02031279 0.01512098
0.02078176 0.01514125 0.01528645 0.01752949]
mean value: 0.015177345275878907
key: test_mcc
value: [0.82880708 0.70453109 0.93202124 0.71921182 0.75434227 0.85714286
0.72168784 0.82618439 0.64951905 0.89342711]
mean value: 0.7886874758942892
key: train_mcc
value: [0.81176962 0.86069176 0.77915876 0.81944159 0.84845848 0.86746041
0.90553988 0.81229142 0.89075842 0.85449628]
mean value: 0.8450066627933005
key: test_accuracy
value: [0.9122807 0.84210526 0.96491228 0.85964912 0.875 0.92857143
0.85714286 0.91071429 0.82142857 0.94642857]
mean value: 0.8918233082706767
key: train_accuracy
value: [0.89940828 0.92899408 0.88560158 0.90927022 0.92125984 0.93307087
0.95275591 0.9015748 0.94488189 0.92716535]
mean value: 0.9203982823153023
key: test_fscore
value: [0.90566038 0.81632653 0.96666667 0.86206897 0.86792453 0.92857143
0.86666667 0.91525424 0.83333333 0.94545455]
mean value: 0.890792727977064
key: train_fscore
value: [0.88984881 0.92622951 0.89298893 0.90688259 0.91631799 0.9348659
0.95294118 0.90842491 0.94615385 0.92787524]
mean value: 0.9202528908003171
key: test_precision
value: [0.96 0.95238095 0.93548387 0.86206897 0.92 0.92857143
0.8125 0.87096774 0.78125 0.96296296]
mean value: 0.8986185922335811
key: train_precision
value: [0.98564593 0.96581197 0.83737024 0.92946058 0.97767857 0.91044776
0.94921875 0.84931507 0.92481203 0.91891892]
mean value: 0.9248679822063575
key: test_recall
value: [0.85714286 0.71428571 1. 0.86206897 0.82142857 0.92857143
0.92857143 0.96428571 0.89285714 0.92857143]
mean value: 0.8897783251231527
key: train_recall
value: [0.81102362 0.88976378 0.95652174 0.88537549 0.86220472 0.96062992
0.95669291 0.97637795 0.96850394 0.93700787]
mean value: 0.920410195761103
key: test_roc_auc
value: [0.91133005 0.83990148 0.96428571 0.85960591 0.875 0.92857143
0.85714286 0.91071429 0.82142857 0.94642857]
mean value: 0.8914408866995074
key: train_roc_auc
value: [0.89958296 0.92907161 0.88574118 0.90922318 0.92125984 0.93307087
0.95275591 0.9015748 0.94488189 0.92716535]
mean value: 0.9204327596402229
key: test_jcc
value: [0.82758621 0.68965517 0.93548387 0.75757576 0.76666667 0.86666667
0.76470588 0.84375 0.71428571 0.89655172]
mean value: 0.8062927661963765
key: train_jcc
value: [0.80155642 0.86259542 0.80666667 0.82962963 0.84555985 0.87769784
0.91011236 0.83221477 0.89781022 0.86545455]
mean value: 0.8529297712747432
MCC on Blind test: 0.27
Accuracy on Blind test: 0.65
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02081895 0.01910305 0.02194667 0.02428484 0.02053165 0.02217913
0.02111697 0.02241921 0.02124858 0.02041411]
mean value: 0.02140631675720215
key: score_time
value: [0.01168537 0.01173663 0.01170278 0.01167035 0.01188517 0.01176786
0.01174521 0.01173615 0.01174688 0.01172304]
mean value: 0.01173994541168213
key: test_mcc
value: [0.8615634 0.6746955 0.79161589 0.64889453 0.43519414 0.43759497
0.74535599 0.82195294 0.72168784 0.82195294]
mean value: 0.6960508151248677
key: train_mcc
value: [0.79945572 0.82446063 0.89465736 0.76270442 0.5287693 0.54398379
0.77497517 0.91002026 0.89833428 0.88188976]
mean value: 0.7819250707589638
key: test_accuracy
value: [0.92982456 0.8245614 0.89473684 0.80701754 0.67857143 0.66071429
0.85714286 0.91071429 0.85714286 0.91071429]
mean value: 0.8331140350877193
key: train_accuracy
value: [0.89151874 0.90729783 0.94674556 0.86982249 0.71850394 0.72834646
0.87795276 0.95472441 0.9488189 0.94094488]
mean value: 0.8784675953967293
key: test_fscore
value: [0.92592593 0.79166667 0.89285714 0.7755102 0.55 0.48648649
0.83333333 0.9122807 0.86666667 0.90909091]
mean value: 0.794381803686315
key: train_fscore
value: [0.87964989 0.89978678 0.94523327 0.85135135 0.60821918 0.62702703
0.86283186 0.95390782 0.94980695 0.94094488]
mean value: 0.8518758998890312
key: test_precision
value: [0.96153846 0.95 0.92592593 0.95 0.91666667 1.
1. 0.89655172 0.8125 0.92592593]
mean value: 0.9339108704194911
key: train_precision
value: [0.99014778 0.98139535 0.97083333 0.9895288 1. 1.
0.98484848 0.97142857 0.93181818 0.94094488]
mean value: 0.9760945381218294
key: test_recall
value: [0.89285714 0.67857143 0.86206897 0.65517241 0.39285714 0.32142857
0.71428571 0.92857143 0.92857143 0.89285714]
mean value: 0.7267241379310345
key: train_recall
value: [0.79133858 0.83070866 0.92094862 0.74703557 0.43700787 0.45669291
0.76771654 0.93700787 0.96850394 0.94094488]
mean value: 0.779790544956584
key: test_roc_auc
value: [0.92918719 0.82204433 0.8953202 0.80972906 0.67857143 0.66071429
0.85714286 0.91071429 0.85714286 0.91071429]
mean value: 0.833128078817734
key: train_roc_auc
value: [0.89171672 0.90744919 0.94669478 0.86958078 0.71850394 0.72834646
0.87795276 0.95472441 0.9488189 0.94094488]
mean value: 0.878473281254863
key: test_jcc
value: [0.86206897 0.65517241 0.80645161 0.63333333 0.37931034 0.32142857
0.71428571 0.83870968 0.76470588 0.83333333]
mean value: 0.6808799849194405
key: train_jcc
value: [0.78515625 0.81782946 0.89615385 0.74117647 0.43700787 0.45669291
0.75875486 0.91187739 0.90441176 0.88847584]
mean value: 0.7597536671094351
MCC on Blind test: 0.35
Accuracy on Blind test: 0.84
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.21202064 0.20090246 0.19995809 0.20182872 0.20169139 0.20114875
0.20171046 0.20118713 0.20071888 0.19601607]
mean value: 0.20171825885772704
key: score_time
value: [0.01640511 0.01657915 0.01657701 0.01667643 0.01658249 0.01643133
0.0165379 0.01658034 0.01661682 0.01556373]
mean value: 0.01645503044128418
key: test_mcc
value: [0.96547546 0.85960591 0.8953202 0.96547546 0.82195294 0.96490128
0.96490128 0.89342711 1. 0.96490128]
mean value: 0.929596092121727
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.92982456 0.94736842 0.98245614 0.91071429 0.98214286
0.98214286 0.94642857 1. 0.98214286]
mean value: 0.9645676691729324
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.92857143 0.94736842 0.98305085 0.9122807 0.98181818
0.98245614 0.94736842 1. 0.98181818]
mean value: 0.9646550505694127
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92857143 0.96428571 0.96666667 0.89655172 1.
0.96551724 0.93103448 1. 1. ]
mean value: 0.9652627257799672
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 0.93103448 1. 0.92857143 0.96428571
1. 0.96428571 1. 0.96428571]
mean value: 0.9645320197044335
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.92980296 0.9476601 0.98214286 0.91071429 0.98214286
0.98214286 0.94642857 1. 0.98214286]
mean value: 0.9645320197044336
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.86666667 0.9 0.96666667 0.83870968 0.96428571
0.96551724 0.9 1. 0.96428571]
mean value: 0.9330417394989141
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.38
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06119132 0.09244418 0.08146381 0.08729506 0.08371305 0.07937193
0.07151246 0.07341266 0.07586217 0.08966494]
mean value: 0.07959315776824952
key: score_time
value: [0.02624488 0.02854729 0.03578186 0.03146505 0.0310235 0.02910662
0.03994799 0.02881432 0.02941585 0.0330739 ]
mean value: 0.03134212493896484
key: test_mcc
value: [0.96547546 0.8953202 0.8953202 0.93202124 0.82618439 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.9232170639899975
key: train_mcc
value: [0.99214142 0.99211042 0.98823457 1. 0.99212598 0.98819663
0.98038334 1. 0.99212598 0.98425197]
mean value: 0.990957032924043
key: test_accuracy
value: [0.98245614 0.94736842 0.94736842 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [0.99605523 0.99605523 0.99408284 1. 0.99606299 0.99409449
0.99015748 1. 0.99606299 0.99212598]
mean value: 0.9954697230893476
key: test_fscore
value: [0.98181818 0.94736842 0.94736842 0.96666667 0.91525424 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9615549166530434
key: train_fscore
value: [0.99604743 0.99606299 0.99403579 1. 0.99606299 0.99408284
0.99009901 1. 0.99606299 0.99212598]
mean value: 0.9954580026885907
key: test_precision
value: [1. 0.93103448 0.96428571 0.93548387 0.87096774 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9562609248371206
key: train_precision
value: [1. 0.99606299 1. 1. 0.99606299 0.99604743
0.99601594 1. 0.99606299 0.99212598]
mean value: 0.9972378327714941
key: test_recall
value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.968103448275862
key: train_recall
value: [0.99212598 0.99606299 0.98814229 1. 0.99606299 0.99212598
0.98425197 1. 0.99606299 0.99212598]
mean value: 0.9936961190127914
key: test_roc_auc
value: [0.98214286 0.9476601 0.9476601 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.99606299 0.99605521 0.99407115 1. 0.99606299 0.99409449
0.99015748 1. 0.99606299 0.99212598]
mean value: 0.9954693286856929
key: test_jcc
value: [0.96428571 0.9 0.9 0.93548387 0.84375 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9268642737962816
key: train_jcc
value: [0.99212598 0.99215686 0.98814229 1. 0.99215686 0.98823529
0.98039216 1. 0.99215686 0.984375 ]
mean value: 0.9909741315957773
MCC on Blind test: 0.08
Accuracy on Blind test: 0.33
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17389727 0.18126535 0.21880174 0.18185973 0.17694855 0.22035027
0.17903447 0.17763066 0.12950563 0.20781183]
mean value: 0.18471055030822753
key: score_time
value: [0.02546906 0.03429747 0.0409584 0.03237605 0.0260911 0.02633905
0.02989173 0.02581334 0.0195868 0.02531815]
mean value: 0.028614115715026856
key: test_mcc
value: [0.85960591 0.54592083 0.65104858 0.65018988 0.4330127 0.67900461
0.53605627 0.5728919 0.64285714 0.82195294]
mean value: 0.6392540773496393
key: train_mcc
value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98428248 0.98437404
0.98437404 0.98437404 0.98428248 0.98437404]
mean value: 0.9851217587100868
key: test_accuracy
value: [0.92982456 0.77192982 0.8245614 0.8245614 0.71428571 0.83928571
0.76785714 0.78571429 0.82142857 0.91071429]
mean value: 0.819016290726817
key: train_accuracy
value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99212598 0.99212598
0.99212598 0.99212598 0.99212598 0.99212598]
mean value: 0.9925142493283015
key: test_fscore
value: [0.92857143 0.75471698 0.82142857 0.83333333 0.69230769 0.83636364
0.76363636 0.77777778 0.82142857 0.90909091]
mean value: 0.813865526507036
key: train_fscore
value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99209486 0.99206349
0.99206349 0.99206349 0.99209486 0.99206349]
mean value: 0.9924634247376443
key: test_precision
value: [0.92857143 0.8 0.85185185 0.80645161 0.75 0.85185185
0.77777778 0.80769231 0.82142857 0.92592593]
mean value: 0.8321551328002941
key: train_precision
value: [1. 1. 1. 1. 0.99603175 1.
1. 1. 0.99603175 1. ]
mean value: 0.9992063492063492
key: test_recall
value: [0.92857143 0.71428571 0.79310345 0.86206897 0.64285714 0.82142857
0.75 0.75 0.82142857 0.89285714]
mean value: 0.7976600985221675
key: train_recall
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197
0.98425197 0.98425197 0.98818898 0.98425197]
mean value: 0.985815878746382
key: test_roc_auc
value: [0.92980296 0.77093596 0.82512315 0.82389163 0.71428571 0.83928571
0.76785714 0.78571429 0.82142857 0.91071429]
mean value: 0.8189039408866995
key: train_roc_auc
value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99212598 0.99212598
0.99212598 0.99212598 0.99212598 0.99212598]
mean value: 0.9925142385857895
key: test_jcc
value: [0.86666667 0.60606061 0.6969697 0.71428571 0.52941176 0.71875
0.61764706 0.63636364 0.6969697 0.83333333]
mean value: 0.6916458174178762
key: train_jcc
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98431373 0.98425197
0.98425197 0.98425197 0.98431373 0.98425197]
mean value: 0.9850408285688307
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.79057455 0.77769208 0.76583242 0.77488422 0.77650523 0.7766695
0.78173661 0.77217269 0.76850343 0.77668691]
mean value: 0.7761257648468017
key: score_time
value: [0.0105288 0.00939512 0.00932217 0.0093441 0.00973582 0.00938821
0.00934148 0.00942039 0.00945306 0.00929689]
mean value: 0.009522604942321777
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1.
0.92857143 0.89342711 0.96490128 0.89342711]
mean value: 0.9259382487307398
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
0.96428571 0.94642857 0.98214286 0.94642857]
mean value: 0.962719298245614
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807 1.
0.96428571 0.94736842 0.98181818 0.94736842]
mean value: 0.9631409244113418
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.93548387 0.89655172 1.
0.96428571 0.93103448 1. 0.93103448]
mean value: 0.9588193230573653
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1.
0.96428571 0.96428571 0.96428571 0.96428571]
mean value: 0.9679802955665024
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
0.96428571 0.94642857 0.98214286 0.94642857]
mean value: 0.9626231527093597
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1.
0.93103448 0.9 0.96428571 0.9 ]
mean value: 0.92981672758091
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.29
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03601861 0.03224778 0.03218007 0.03201056 0.03217316 0.03178501
0.03229713 0.03250289 0.05251861 0.08426547]
mean value: 0.03979992866516113
key: score_time
value: [0.0125165 0.01252484 0.01253128 0.01482654 0.01470113 0.01488376
0.01485562 0.01477528 0.01557207 0.02143931]
mean value: 0.014862632751464844
key: test_mcc
value: [ 0.33525717 -0.06746787 0.38590439 0.26729964 -0.06262243 0.10206207
0.14586499 0.10206207 0.53881591 0.18650096]
mean value: 0.19336769075937818
key: train_mcc
value: [0.67107707 0.44011793 0.61033709 0.4792439 0.40307741 0.37626192
0.35901099 0.52572037 0.9453509 0.35901099]
mean value: 0.516920856933795
key: test_accuracy
value: [0.63157895 0.47368421 0.66666667 0.61403509 0.48214286 0.53571429
0.55357143 0.53571429 0.76785714 0.57142857]
mean value: 0.5832393483709273
key: train_accuracy
value: [0.81065089 0.66272189 0.77120316 0.68639053 0.63976378 0.62401575
0.61417323 0.71653543 0.97244094 0.61417323]
mean value: 0.7112068831632732
key: test_fscore
value: [0.71232877 0.625 0.73972603 0.7027027 0.63291139 0.65789474
0.66666667 0.65789474 0.77966102 0.67567568]
mean value: 0.685046172260402
key: train_fscore
value: [0.8410596 0.74815906 0.81350482 0.76090226 0.73516643 0.7267525
0.72159091 0.7791411 0.97286822 0.72159091]
mean value: 0.7820735807454069
key: test_precision
value: [0.57777778 0.48076923 0.61363636 0.57777778 0.49019608 0.52083333
0.53191489 0.52083333 0.74193548 0.54347826]
mean value: 0.5599152533416744
key: train_precision
value: [0.72571429 0.59764706 0.68563686 0.61407767 0.5812357 0.57078652
0.56444444 0.63819095 0.95801527 0.56444444]
mean value: 0.6500193196442058
key: test_recall
value: [0.92857143 0.89285714 0.93103448 0.89655172 0.89285714 0.89285714
0.89285714 0.89285714 0.82142857 0.89285714]
mean value: 0.893472906403941
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.98818898 1. ]
mean value: 0.9988188976377953
key: test_roc_auc
value: [0.63669951 0.48091133 0.66194581 0.60899015 0.48214286 0.53571429
0.55357143 0.53571429 0.76785714 0.57142857]
mean value: 0.5834975369458129
key: train_roc_auc
value: [0.81027668 0.66205534 0.77165354 0.68700787 0.63976378 0.62401575
0.61417323 0.71653543 0.97244094 0.61417323]
mean value: 0.7112095795337836
key: test_jcc
value: [0.55319149 0.45454545 0.58695652 0.54166667 0.46296296 0.49019608
0.5 0.49019608 0.63888889 0.51020408]
mean value: 0.5228808222660204
key: train_jcc
value: [0.72571429 0.59764706 0.68563686 0.61407767 0.5812357 0.57078652
0.56444444 0.63819095 0.94716981 0.56444444]
mean value: 0.648934774058724
MCC on Blind test: -0.04
Accuracy on Blind test: 0.19
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02779937 0.02872729 0.01580143 0.01564527 0.01578498 0.04575109
0.04329133 0.04417109 0.03463864 0.02420235]
mean value: 0.029581284523010253
key: score_time
value: [0.02095222 0.01218772 0.01191568 0.01196671 0.01234722 0.01872563
0.02054381 0.02364755 0.02555585 0.0186007 ]
mean value: 0.017644309997558595
key: test_mcc
value: [0.92980296 0.8953202 0.82512315 0.82490815 0.75434227 0.78772636
0.71611487 0.78571429 0.71611487 0.82618439]
mean value: 0.8061351507165199
key: train_mcc
value: [0.86198955 0.86998617 0.85801653 0.86210547 0.88599845 0.85465533
0.86624915 0.87013943 0.87040934 0.86253233]
mean value: 0.8662081754476578
key: test_accuracy
value: [0.96491228 0.94736842 0.9122807 0.9122807 0.875 0.89285714
0.85714286 0.89285714 0.85714286 0.91071429]
mean value: 0.9022556390977443
key: train_accuracy
value: [0.93096647 0.93491124 0.92899408 0.93096647 0.94291339 0.92716535
0.93307087 0.93503937 0.93503937 0.93110236]
mean value: 0.933016897296122
key: test_fscore
value: [0.96428571 0.94736842 0.9122807 0.91525424 0.88135593 0.88888889
0.86206897 0.89285714 0.86206897 0.90566038]
mean value: 0.9032089346723262
key: train_fscore
value: [0.93150685 0.93567251 0.92913386 0.93150685 0.94346979 0.92815534
0.93359375 0.93542074 0.93592233 0.93203883]
mean value: 0.9336420855587076
key: test_precision
value: [0.96428571 0.93103448 0.92857143 0.9 0.83870968 0.92307692
0.83333333 0.89285714 0.83333333 0.96 ]
mean value: 0.9005202035635851
key: train_precision
value: [0.92607004 0.92664093 0.9254902 0.92248062 0.93436293 0.91570881
0.92635659 0.92996109 0.92337165 0.91954023]
mean value: 0.924998308444446
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.85714286
0.89285714 0.89285714 0.89285714 0.85714286]
mean value: 0.9077586206896552
key: train_recall
value: [0.93700787 0.94488189 0.93280632 0.94071146 0.95275591 0.94094488
0.94094488 0.94094488 0.9488189 0.94488189]
mean value: 0.942469888892347
key: test_roc_auc
value: [0.96490148 0.9476601 0.91256158 0.91194581 0.875 0.89285714
0.85714286 0.89285714 0.85714286 0.91071429]
mean value: 0.9022783251231528
key: train_roc_auc
value: [0.93095453 0.93489154 0.92900159 0.93098565 0.94291339 0.92716535
0.93307087 0.93503937 0.93503937 0.93110236]
mean value: 0.9330164016059258
key: test_jcc
value: [0.93103448 0.9 0.83870968 0.84375 0.78787879 0.8
0.75757576 0.80645161 0.75757576 0.82758621]
mean value: 0.8250562283008056
key: train_jcc
value: [0.87179487 0.87912088 0.86764706 0.87179487 0.89298893 0.86594203
0.87545788 0.87867647 0.87956204 0.87272727]
mean value: 0.8755712302977963
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.18849206 0.29984403 0.31634831 0.33595371 0.3800807 0.31235695
0.30970716 0.2922895 0.28889799 0.29170918]
mean value: 0.3015679597854614
key: score_time
value: [0.01897597 0.01867318 0.02272344 0.01876187 0.02406073 0.01955128
0.01868868 0.01871037 0.01880717 0.01869726]
mean value: 0.01976499557495117
key: test_mcc
value: [0.92980296 0.8953202 0.85960591 0.82490815 0.75434227 0.78772636
0.71611487 0.78571429 0.71611487 0.82618439]
mean value: 0.8095834265785888
key: train_mcc
value: [0.86198955 0.86998617 0.88168563 0.86210547 0.90174953 0.85465533
0.86624915 0.87013943 0.89020543 0.86253233]
mean value: 0.8721298027814565
key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.9122807 0.875 0.89285714
0.85714286 0.89285714 0.85714286 0.91071429]
mean value: 0.9040100250626566
key: train_accuracy
value: [0.93096647 0.93491124 0.9408284 0.93096647 0.9507874 0.92716535
0.93307087 0.93503937 0.94488189 0.93110236]
mean value: 0.9359719827920918
key: test_fscore
value: [0.96428571 0.94736842 0.93103448 0.91525424 0.88135593 0.88888889
0.86206897 0.89285714 0.86206897 0.90566038]
mean value: 0.9050843127727497
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93150685 0.93567251 0.94094488 0.93150685 0.95126706 0.92815534
0.93359375 0.93542074 0.94573643 0.93203883]
mean value: 0.9365843254175729
key: test_precision
value: [0.96428571 0.93103448 0.93103448 0.9 0.83870968 0.92307692
0.83333333 0.89285714 0.83333333 0.96 ]
mean value: 0.9007665089823044
key: train_precision
value: [0.92607004 0.92664093 0.9372549 0.92248062 0.94208494 0.91570881
0.92635659 0.92996109 0.93129771 0.91954023]
mean value: 0.9277395860462906
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.85714286
0.89285714 0.89285714 0.89285714 0.85714286]
mean value: 0.9112068965517242
key: train_recall
value: [0.93700787 0.94488189 0.94466403 0.94071146 0.96062992 0.94094488
0.94094488 0.94094488 0.96062992 0.94488189]
mean value: 0.945624163580343
key: test_roc_auc
value: [0.96490148 0.9476601 0.92980296 0.91194581 0.875 0.89285714
0.85714286 0.89285714 0.85714286 0.91071429]
mean value: 0.9040024630541872
key: train_roc_auc
value: [0.93095453 0.93489154 0.94083595 0.93098565 0.9507874 0.92716535
0.93307087 0.93503937 0.94488189 0.93110236]
mean value: 0.9359714917058293
key: test_jcc
value: [0.93103448 0.9 0.87096774 0.84375 0.78787879 0.8
0.75757576 0.80645161 0.75757576 0.82758621]
mean value: 0.8282820347524185
key: train_jcc
value: [0.87179487 0.87912088 0.88847584 0.87179487 0.9070632 0.86594203
0.87545788 0.87867647 0.89705882 0.87272727]
mean value: 0.8808112127456175
MCC on Blind test: 0.22
Accuracy on Blind test: 0.68
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0371778 0.03735137 0.03989172 0.03771639 0.0379281 0.03412724
0.0368371 0.03751278 0.03696871 0.0387764 ]
mean value: 0.037428760528564455
key: score_time
value: [0.01196814 0.01196027 0.01489449 0.01493311 0.01487374 0.01206851
0.01477408 0.01476097 0.01480389 0.01474452]
mean value: 0.013978171348571777
key: test_mcc
value: [0.92980296 0.86189955 0.79161589 0.79110556 0.71611487 0.8660254
0.64450339 0.82195294 0.75047877 0.85933785]
mean value: 0.8032837186777093
key: train_mcc
value: [0.84631191 0.86587719 0.86598917 0.87771934 0.87412415 0.84262418
0.85042006 0.86614173 0.87828635 0.86237183]
mean value: 0.8629865906620574
key: test_accuracy
value: [0.96491228 0.92982456 0.89473684 0.89473684 0.85714286 0.92857143
0.82142857 0.91071429 0.875 0.92857143]
mean value: 0.9005639097744361
key: train_accuracy
value: [0.92307692 0.93293886 0.93293886 0.93885602 0.93700787 0.92125984
0.92519685 0.93307087 0.93897638 0.93110236]
mean value: 0.9314424824115921
key: test_fscore
value: [0.96428571 0.93103448 0.89285714 0.9 0.86206897 0.92307692
0.82758621 0.9122807 0.87719298 0.92592593]
mean value: 0.9016309045528647
key: train_fscore
value: [0.92397661 0.93307087 0.93333333 0.93885602 0.9375 0.921875
0.9254902 0.93307087 0.93980583 0.93177388]
mean value: 0.9318752590046475
key: test_precision
value: [0.96428571 0.9 0.92592593 0.87096774 0.83333333 1.
0.8 0.89655172 0.86206897 0.96153846]
mean value: 0.9014671866674091
key: train_precision
value: [0.91505792 0.93307087 0.92607004 0.93700787 0.93023256 0.91472868
0.921875 0.93307087 0.92720307 0.92277992]
mean value: 0.9261096788491734
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.85714286
0.85714286 0.92857143 0.89285714 0.89285714]
mean value: 0.9043103448275862
key: train_recall
value: [0.93307087 0.93307087 0.94071146 0.94071146 0.94488189 0.92913386
0.92913386 0.93307087 0.95275591 0.94094488]
mean value: 0.937748591702717
key: test_roc_auc
value: [0.96490148 0.93041872 0.8953202 0.89408867 0.85714286 0.92857143
0.82142857 0.91071429 0.875 0.92857143]
mean value: 0.9006157635467981
key: train_roc_auc
value: [0.92305717 0.9329386 0.93295416 0.93885967 0.93700787 0.92125984
0.92519685 0.93307087 0.93897638 0.93110236]
mean value: 0.9314423765211167
key: test_jcc
value: [0.93103448 0.87096774 0.80645161 0.81818182 0.75757576 0.85714286
0.70588235 0.83870968 0.78125 0.86206897]
mean value: 0.8229265266375536
key: train_jcc
value: [0.85869565 0.87453875 0.875 0.88475836 0.88235294 0.85507246
0.86131387 0.87453875 0.88644689 0.87226277]
mean value: 0.8724980440988328
MCC on Blind test: 0.31
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.89304805 1.03817892 0.89372754 1.10189581 0.90773821 0.89544559
1.06514525 0.90930581 1.09472632 0.9275372 ]
mean value: 0.9726748704910279
key: score_time
value: [0.01456618 0.01514888 0.01538634 0.01519942 0.01540756 0.0152154
0.01212883 0.01528525 0.01524878 0.01212168]
mean value: 0.014570832252502441
key: test_mcc
value: [0.92980296 0.82512315 0.82512315 0.75462449 0.71611487 0.89802651
0.60753044 0.82195294 0.75047877 0.85933785]
mean value: 0.7988115143025961
key: train_mcc
value: [0.88954592 0.8974355 0.90138653 0.94089268 0.90158179 0.94112724
0.8307151 0.89766562 0.90191737 0.81892302]
mean value: 0.8921190789888661
key: test_accuracy
value: [0.96491228 0.9122807 0.9122807 0.87719298 0.85714286 0.94642857
0.80357143 0.91071429 0.875 0.92857143]
mean value: 0.8988095238095238
key: train_accuracy
value: [0.94477318 0.94871795 0.95069034 0.9704142 0.9507874 0.97047244
0.91535433 0.9488189 0.9507874 0.90944882]
mean value: 0.946026495208809
key: test_fscore
value: [0.96428571 0.9122807 0.9122807 0.88135593 0.86206897 0.94339623
0.80701754 0.9122807 0.87719298 0.92592593]
mean value: 0.8998085395926314
key: train_fscore
value: [0.94488189 0.9488189 0.95049505 0.97017893 0.95088409 0.97017893
0.91552063 0.94901961 0.95145631 0.90980392]
mean value: 0.9461238245008307
key: test_precision
value: [0.96428571 0.89655172 0.92857143 0.86666667 0.83333333 1.
0.79310345 0.89655172 0.86206897 0.96153846]
mean value: 0.900267146646457
key: train_precision
value: [0.94488189 0.9488189 0.95238095 0.976 0.94901961 0.97991968
0.91372549 0.9453125 0.93869732 0.90625 ]
mean value: 0.9455006334544265
key: test_recall
value: [0.96428571 0.92857143 0.89655172 0.89655172 0.89285714 0.89285714
0.82142857 0.92857143 0.89285714 0.89285714]
mean value: 0.9007389162561577
key: train_recall
value: [0.94488189 0.9488189 0.9486166 0.96442688 0.95275591 0.96062992
0.91732283 0.95275591 0.96456693 0.91338583]
mean value: 0.946816158849709
key: test_roc_auc
value: [0.96490148 0.91256158 0.91256158 0.87684729 0.85714286 0.94642857
0.80357143 0.91071429 0.875 0.92857143]
mean value: 0.8988300492610838
key: train_roc_auc
value: [0.94477296 0.94871775 0.95068625 0.97040242 0.9507874 0.97047244
0.91535433 0.9488189 0.9507874 0.90944882]
mean value: 0.9460248669509197
key: test_jcc
value: [0.93103448 0.83870968 0.83870968 0.78787879 0.75757576 0.89285714
0.67647059 0.83870968 0.78125 0.86206897]
mean value: 0.8205264757080909
key: train_jcc
value: [0.89552239 0.90262172 0.90566038 0.94208494 0.90636704 0.94208494
0.8442029 0.90298507 0.90740741 0.83453237]
mean value: 0.8983469168318737
MCC on Blind test: 0.24
Accuracy on Blind test: 0.64
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01492214 0.010638 0.01045823 0.01021338 0.01002026 0.01002669
0.01005268 0.01020908 0.00999951 0.01019502]
mean value: 0.01067349910736084
key: score_time
value: [0.01205015 0.00935721 0.0090673 0.00890112 0.00881839 0.00884724
0.00882244 0.00877309 0.00878334 0.00903559]
mean value: 0.009245586395263673
key: test_mcc
value: [0.8615634 0.55091314 0.72706729 0.62473685 0.4330127 0.47951222
0.79385662 0.50128041 0.54446551 0.75434227]
mean value: 0.6270750410241596
key: train_mcc
value: [0.66657128 0.67011432 0.64146859 0.66742134 0.63711603 0.67419557
0.66977469 0.70816293 0.65740182 0.67253825]
mean value: 0.6664764817127335
key: test_accuracy
value: [0.92982456 0.77192982 0.85964912 0.80701754 0.71428571 0.73214286
0.89285714 0.75 0.76785714 0.875 ]
mean value: 0.8100563909774436
key: train_accuracy
value: [0.82840237 0.83037475 0.81656805 0.83037475 0.80511811 0.83267717
0.83070866 0.8523622 0.82480315 0.83070866]
mean value: 0.8282097873860442
key: test_fscore
value: [0.92592593 0.74509804 0.85185185 0.79245283 0.69230769 0.69387755
0.88461538 0.74074074 0.74509804 0.86792453]
mean value: 0.7939892583383943
key: train_fscore
value: [0.81290323 0.81545064 0.8 0.81702128 0.77241379 0.81798715
0.81623932 0.8447205 0.81023454 0.81385281]
mean value: 0.8120823259881095
key: test_precision
value: [0.96153846 0.82608696 0.92 0.875 0.75 0.80952381
0.95833333 0.76923077 0.82608696 0.92 ]
mean value: 0.8615800286669852
key: train_precision
value: [0.8957346 0.89622642 0.87735849 0.88479263 0.9281768 0.89671362
0.89252336 0.89082969 0.88372093 0.90384615]
mean value: 0.8949922683036309
key: test_recall
value: [0.89285714 0.67857143 0.79310345 0.72413793 0.64285714 0.60714286
0.82142857 0.71428571 0.67857143 0.82142857]
mean value: 0.7374384236453202
key: train_recall
value: [0.74409449 0.7480315 0.73517787 0.75889328 0.66141732 0.7519685
0.7519685 0.80314961 0.7480315 0.74015748]
mean value: 0.7442890043882855
key: test_roc_auc
value: [0.92918719 0.7703202 0.86083744 0.80849754 0.71428571 0.73214286
0.89285714 0.75 0.76785714 0.875 ]
mean value: 0.8100985221674877
key: train_roc_auc
value: [0.82856898 0.83053749 0.81640783 0.83023404 0.80511811 0.83267717
0.83070866 0.8523622 0.82480315 0.83070866]
mean value: 0.8282126295477887
key: test_jcc
value: [0.86206897 0.59375 0.74193548 0.65625 0.52941176 0.53125
0.79310345 0.58823529 0.59375 0.76666667]
mean value: 0.6656421623154267
key: train_jcc
value: [0.68478261 0.6884058 0.66666667 0.69064748 0.62921348 0.69202899
0.68953069 0.7311828 0.68100358 0.68613139]
mean value: 0.6839593475841678
MCC on Blind test: 0.33
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01052785 0.01058602 0.01057577 0.01063657 0.01053834 0.01070213
0.01048851 0.01046777 0.01058173 0.0106101 ]
mean value: 0.010571479797363281
key: score_time
value: [0.0089097 0.00898433 0.00901771 0.00898981 0.0089891 0.00899673
0.00903559 0.00891829 0.00887704 0.00920439]
mean value: 0.008992266654968262
key: test_mcc
value: [0.8953202 0.7589669 0.68472906 0.58562417 0.64450339 0.75047877
0.64285714 0.71428571 0.53881591 0.75047877]
mean value: 0.6966060026275183
key: train_mcc
value: [0.74761876 0.73570695 0.73177298 0.73972796 0.74414639 0.74805469
0.75211424 0.7480315 0.74812427 0.7486119 ]
mean value: 0.7443909629951391
key: test_accuracy
value: [0.94736842 0.87719298 0.84210526 0.78947368 0.82142857 0.875
0.82142857 0.85714286 0.76785714 0.875 ]
mean value: 0.8473997493734335
key: train_accuracy
value: [0.87376726 0.8678501 0.86587771 0.86982249 0.87204724 0.87401575
0.87598425 0.87401575 0.87401575 0.87401575]
mean value: 0.8721412042429607
key: test_fscore
value: [0.94736842 0.88135593 0.84210526 0.77777778 0.82758621 0.87719298
0.82142857 0.85714286 0.77966102 0.87719298]
mean value: 0.8488812011521107
key: train_fscore
value: [0.875 0.8678501 0.86507937 0.8685259 0.87128713 0.87351779
0.87475149 0.87401575 0.87301587 0.87644788]
mean value: 0.8719491263936097
key: test_precision
value: [0.93103448 0.83870968 0.85714286 0.84 0.8 0.86206897
0.82142857 0.85714286 0.74193548 0.86206897]
mean value: 0.8411531860797712
key: train_precision
value: [0.86821705 0.86956522 0.8685259 0.87550201 0.87649402 0.87698413
0.88353414 0.87401575 0.88 0.85984848]
mean value: 0.8732686696416017
key: test_recall
value: [0.96428571 0.92857143 0.82758621 0.72413793 0.85714286 0.89285714
0.82142857 0.85714286 0.82142857 0.89285714]
mean value: 0.858743842364532
key: train_recall
value: [0.88188976 0.86614173 0.86166008 0.86166008 0.86614173 0.87007874
0.86614173 0.87401575 0.86614173 0.89370079]
mean value: 0.8707572126606704
key: test_roc_auc
value: [0.9476601 0.87807882 0.84236453 0.79064039 0.82142857 0.875
0.82142857 0.85714286 0.76785714 0.875 ]
mean value: 0.8476600985221675
key: train_roc_auc
value: [0.87375121 0.86785347 0.86586941 0.86980642 0.87204724 0.87401575
0.87598425 0.87401575 0.87401575 0.87401575]
mean value: 0.8721374996109676
key: test_jcc
value: [0.9 0.78787879 0.72727273 0.63636364 0.70588235 0.78125
0.6969697 0.75 0.63888889 0.78125 ]
mean value: 0.7405756090314914
key: train_jcc
value: [0.77777778 0.76655052 0.76223776 0.76760563 0.77192982 0.7754386
0.77738516 0.77622378 0.77464789 0.78006873]
mean value: 0.7729865668599729
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00995111 0.0098927 0.00990295 0.01049066 0.01074862 0.01070476
0.01121283 0.01361084 0.01100898 0.01114106]
mean value: 0.010866451263427734
key: score_time
value: [0.0171814 0.01235557 0.0125854 0.01293373 0.01254654 0.01322293
0.01275516 0.01698422 0.01724505 0.01798248]
mean value: 0.014579248428344727
key: test_mcc
value: [0.69397486 0.40447771 0.34042547 0.54592083 0.42966892 0.57735027
0.42857143 0.50518149 0.57142857 0.67900461]
mean value: 0.5176004157137253
key: train_mcc
value: [0.68074909 0.71607321 0.66913289 0.70418327 0.70921127 0.68153033
0.71326761 0.68114987 0.67411185 0.67887215]
mean value: 0.6908281541769783
key: test_accuracy
value: [0.84210526 0.70175439 0.66666667 0.77192982 0.71428571 0.78571429
0.71428571 0.75 0.78571429 0.83928571]
mean value: 0.7571741854636591
key: train_accuracy
value: [0.84023669 0.85798817 0.83431953 0.85207101 0.85433071 0.84055118
0.85629921 0.84055118 0.83661417 0.83858268]
mean value: 0.8451544518473653
key: test_fscore
value: [0.82352941 0.67924528 0.64150943 0.78688525 0.7037037 0.76923077
0.71428571 0.73076923 0.78571429 0.83636364]
mean value: 0.7471236714714817
key: train_fscore
value: [0.83832335 0.85714286 0.83064516 0.85089463 0.85140562 0.83767535
0.85311871 0.83960396 0.83232323 0.83265306]
mean value: 0.8423785943342119
key: test_precision
value: [0.91304348 0.72 0.70833333 0.75 0.73076923 0.83333333
0.71428571 0.79166667 0.78571429 0.85185185]
mean value: 0.7798997894215285
key: train_precision
value: [0.85020243 0.864 0.84773663 0.856 0.86885246 0.85306122
0.87242798 0.84462151 0.85477178 0.86440678]
mean value: 0.857608079954709
key: test_recall
value: [0.75 0.64285714 0.5862069 0.82758621 0.67857143 0.71428571
0.71428571 0.67857143 0.78571429 0.82142857]
mean value: 0.7199507389162562
key: train_recall
value: [0.82677165 0.8503937 0.81422925 0.8458498 0.83464567 0.82283465
0.83464567 0.83464567 0.81102362 0.80314961]
mean value: 0.8278189287603872
key: test_roc_auc
value: [0.84051724 0.70073892 0.66810345 0.77093596 0.71428571 0.78571429
0.71428571 0.75 0.78571429 0.83928571]
mean value: 0.7569581280788177
key: train_roc_auc
value: [0.8402633 0.85800317 0.83427998 0.85205876 0.85433071 0.84055118
0.85629921 0.84055118 0.83661417 0.83858268]
mean value: 0.8451534343780149
key: test_jcc
value: [0.7 0.51428571 0.47222222 0.64864865 0.54285714 0.625
0.55555556 0.57575758 0.64705882 0.71875 ]
mean value: 0.6000135682856271
key: train_jcc
value: [0.72164948 0.75 0.71034483 0.74048443 0.74125874 0.72068966
0.74385965 0.72354949 0.71280277 0.71328671]
mean value: 0.7277925756249406
MCC on Blind test: 0.24
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02651811 0.02431774 0.02231216 0.02248645 0.02474499 0.02260637
0.02188492 0.02576017 0.02266097 0.02236414]
mean value: 0.02356560230255127
key: score_time
value: [0.01297498 0.01320481 0.01189494 0.01260996 0.0117631 0.01185036
0.01195765 0.01315618 0.01193786 0.01203465]
mean value: 0.01233844757080078
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.75462449 0.75047877 0.78772636
0.64285714 0.71611487 0.67900461 0.85933785]
mean value: 0.7840390407141125
key: train_mcc
value: [0.77909184 0.79489255 0.78304441 0.80278863 0.80714291 0.79139378
0.80709287 0.79926835 0.81501748 0.79149195]
mean value: 0.7971224773953642
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.875 0.89285714
0.82142857 0.85714286 0.83928571 0.92857143]
mean value: 0.8916040100250626
key: train_accuracy
value: [0.88954635 0.8974359 0.89151874 0.90138067 0.90354331 0.89566929
0.90354331 0.8996063 0.90748031 0.89566929]
mean value: 0.8985393467828355
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.87719298 0.88888889
0.82142857 0.85185185 0.84210526 0.92592593]
mean value: 0.8914520740776547
key: train_fscore
value: [0.88976378 0.89803922 0.89151874 0.9015748 0.90410959 0.8962818
0.90373281 0.90019569 0.90802348 0.89668616]
mean value: 0.8989926072825011
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.86206897 0.92307692
0.82142857 0.88461538 0.82758621 0.96153846]
mean value: 0.8940084628015662
key: train_precision
value: [0.88976378 0.89453125 0.88976378 0.89803922 0.89883268 0.89105058
0.90196078 0.89494163 0.90272374 0.88803089]
mean value: 0.8949638335218302
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.82142857 0.85714286 0.89285714]
mean value: 0.8899014778325123
key: train_recall
value: [0.88976378 0.9015748 0.89328063 0.90513834 0.90944882 0.9015748
0.90551181 0.90551181 0.91338583 0.90551181]
mean value: 0.9030702436898945
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.87684729 0.875 0.89285714
0.82142857 0.85714286 0.83928571 0.92857143]
mean value: 0.8916256157635468
key: train_roc_auc
value: [0.88954592 0.89742772 0.89152221 0.90138807 0.90354331 0.89566929
0.90354331 0.8996063 0.90748031 0.89566929]
mean value: 0.8985395723755875
key: test_jcc
value: [0.9 0.9 0.87096774 0.78787879 0.78125 0.8
0.6969697 0.74193548 0.72727273 0.86206897]
mean value: 0.8068343403444905
key: train_jcc
value: [0.80141844 0.81494662 0.80427046 0.82078853 0.825 0.81205674
0.82437276 0.81850534 0.83154122 0.81272085]
mean value: 0.8165620954250901
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.59886169 1.92170119 2.23752618 2.16796374 2.11638856 2.47973561
2.14198256 3.86806488 3.29336166 2.04994941]
mean value: 2.3875535488128663
key: score_time
value: [0.01230478 0.01873446 0.01443458 0.02749872 0.01352763 0.01245713
0.0131278 0.01426792 0.01233697 0.01281023]
mean value: 0.015150022506713868
key: test_mcc
value: [0.82512315 0.72064772 0.82490815 0.7257422 0.67900461 0.85933785
0.75434227 0.78571429 0.68250015 0.78772636]
mean value: 0.7645046746713943
key: train_mcc
value: [0.98425172 0.99214142 0.99211042 0.99606299 0.96853396 0.99215674
0.98819663 0.99607071 0.98425197 0.98819663]
mean value: 0.9881973215872323
key: test_accuracy
value: [0.9122807 0.85964912 0.9122807 0.85964912 0.83928571 0.92857143
0.875 0.89285714 0.83928571 0.89285714]
mean value: 0.881171679197995
key: train_accuracy
value: [0.99211045 0.99605523 0.99605523 0.99802761 0.98425197 0.99606299
0.99409449 0.9980315 0.99212598 0.99409449]
mean value: 0.9940909938032894
key: test_fscore
value: [0.9122807 0.85185185 0.91525424 0.87096774 0.84210526 0.92592593
0.88135593 0.89285714 0.84745763 0.88888889]
mean value: 0.8828945312981744
key: train_fscore
value: [0.99209486 0.99604743 0.99604743 0.99802761 0.98418972 0.99604743
0.99408284 0.99803536 0.99212598 0.99410609]
mean value: 0.994080476920228
key: test_precision
value: [0.89655172 0.88461538 0.9 0.81818182 0.82758621 0.96153846
0.83870968 0.89285714 0.80645161 0.92307692]
mean value: 0.8749568951626794
key: train_precision
value: [0.99603175 1. 0.99604743 0.99606299 0.98809524 1.
0.99604743 0.99607843 0.99212598 0.99215686]
mean value: 0.9952646116282663
key: test_recall
value: [0.92857143 0.82142857 0.93103448 0.93103448 0.85714286 0.89285714
0.92857143 0.89285714 0.89285714 0.85714286]
mean value: 0.8933497536945813
key: train_recall
value: [0.98818898 0.99212598 0.99604743 1. 0.98031496 0.99212598
0.99212598 1. 0.99212598 0.99606299]
mean value: 0.9929118296971772
key: test_roc_auc
value: [0.91256158 0.85899015 0.91194581 0.85837438 0.83928571 0.92857143
0.875 0.89285714 0.83928571 0.89285714]
mean value: 0.8809729064039409
key: train_roc_auc
value: [0.9921182 0.99606299 0.99605521 0.9980315 0.98425197 0.99606299
0.99409449 0.9980315 0.99212598 0.99409449]
mean value: 0.9940929320593819
key: test_jcc
value: [0.83870968 0.74193548 0.84375 0.77142857 0.72727273 0.86206897
0.78787879 0.80645161 0.73529412 0.8 ]
mean value: 0.7914789943937935
key: train_jcc
value: [0.98431373 0.99212598 0.99212598 0.99606299 0.9688716 0.99212598
0.98823529 0.99607843 0.984375 0.98828125]
mean value: 0.9882596241193021
MCC on Blind test: 0.25
Accuracy on Blind test: 0.67
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03135872 0.02370524 0.02240443 0.02254367 0.02118802 0.02509332
0.02626276 0.02403593 0.02359962 0.02351856]
mean value: 0.024371027946472168
key: score_time
value: [0.01250744 0.01007581 0.00941324 0.00972056 0.00988364 0.00994086
0.00985217 0.00987315 0.0091753 0.00908613]
mean value: 0.009952831268310546
key: test_mcc
value: [0.96547546 0.82512315 0.86189955 0.92980296 0.85714286 0.92857143
0.82195294 0.85933785 0.93094934 0.85933785]
mean value: 0.883959337340627
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.9122807 0.92982456 0.96491228 0.92857143 0.96428571
0.91071429 0.92857143 0.96428571 0.92857143]
mean value: 0.9414473684210526
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.9122807 0.92857143 0.96551724 0.92857143 0.96428571
0.90909091 0.93103448 0.96296296 0.93103448]
mean value: 0.9415167533951563
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89655172 0.96296296 0.96551724 0.92857143 0.96428571
0.92592593 0.9 1. 0.9 ]
mean value: 0.9443814997263273
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 0.89655172 0.96551724 0.92857143 0.96428571
0.89285714 0.96428571 0.92857143 0.96428571]
mean value: 0.9397783251231527
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.91256158 0.93041872 0.96490148 0.92857143 0.96428571
0.91071429 0.92857143 0.96428571 0.92857143]
mean value: 0.9415024630541873
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.83870968 0.86666667 0.93333333 0.86666667 0.93103448
0.83333333 0.87096774 0.92857143 0.87096774]
mean value: 0.8904536786906087
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.47
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.14206529 0.13289452 0.14049268 0.12361646 0.12473321 0.12784219
0.12505698 0.12953353 0.14142609 0.14261723]
mean value: 0.13302781581878662
key: score_time
value: [0.01912665 0.01963639 0.01777506 0.01807332 0.01823997 0.01803923
0.01796126 0.01868534 0.02039599 0.01962233]
mean value: 0.018755555152893066
key: test_mcc
value: [0.92980296 0.64901478 0.82490815 0.75462449 0.82195294 0.89342711
0.85933785 0.82195294 0.78571429 0.82195294]
mean value: 0.8162688454469251
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.8245614 0.9122807 0.87719298 0.91071429 0.94642857
0.92857143 0.91071429 0.89285714 0.91071429]
mean value: 0.9078947368421053
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.82142857 0.91525424 0.88135593 0.9122807 0.94736842
0.93103448 0.9122807 0.89285714 0.90909091]
mean value: 0.9087236814473888
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.82142857 0.9 0.86666667 0.89655172 0.93103448
0.9 0.89655172 0.89285714 0.92592593]
mean value: 0.8995301952198504
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.82142857 0.93103448 0.89655172 0.92857143 0.96428571
0.96428571 0.92857143 0.89285714 0.89285714]
mean value: 0.9184729064039409
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.82450739 0.91194581 0.87684729 0.91071429 0.94642857
0.92857143 0.91071429 0.89285714 0.91071429]
mean value: 0.907820197044335
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.6969697 0.84375 0.78787879 0.83870968 0.9
0.87096774 0.83870968 0.80645161 0.83333333]
mean value: 0.8347805010617858
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.31
Accuracy on Blind test: 0.72
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01085973 0.01103044 0.01346946 0.01193619 0.01305199 0.01162624
0.01204753 0.01230168 0.01173186 0.01169991]
mean value: 0.011975502967834473
key: score_time
value: [0.01012802 0.00980639 0.01152587 0.00962973 0.00939035 0.00966001
0.00961351 0.01386833 0.0105443 0.00907779]
mean value: 0.010324430465698243
key: test_mcc
value: [0.57881773 0.54759338 0.75462449 0.50927421 0.4645821 0.71611487
0.50128041 0.57142857 0.53605627 0.39310793]
mean value: 0.5572879969415938
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.77192982 0.87719298 0.75438596 0.73214286 0.85714286
0.75 0.78571429 0.76785714 0.69642857]
mean value: 0.7782268170426065
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.78571429 0.77966102 0.88135593 0.76666667 0.73684211 0.86206897
0.75862069 0.78571429 0.76363636 0.69090909]
mean value: 0.7811189402228806
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.78571429 0.74193548 0.86666667 0.74193548 0.72413793 0.83333333
0.73333333 0.78571429 0.77777778 0.7037037 ]
mean value: 0.7694252285019805
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.78571429 0.82142857 0.89655172 0.79310345 0.75 0.89285714
0.78571429 0.78571429 0.75 0.67857143]
mean value: 0.7939655172413793
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78940887 0.77278325 0.87684729 0.75369458 0.73214286 0.85714286
0.75 0.78571429 0.76785714 0.69642857]
mean value: 0.7782019704433497
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64705882 0.63888889 0.78787879 0.62162162 0.58333333 0.75757576
0.61111111 0.64705882 0.61764706 0.52777778]
mean value: 0.6439951984069632
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.68
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.97127247 1.87376833 1.90832615 1.96241379 1.86430383 1.85158372
1.85320497 1.85398388 1.91005635 1.90927792]
mean value: 1.8958191394805908
key: score_time
value: [0.0927887 0.09511352 0.10197377 0.101542 0.09229207 0.09235573
0.09270048 0.09604359 0.09203935 0.09217215]
mean value: 0.0949021339416504
key: test_mcc
value: [0.96547546 0.8953202 0.8953202 0.82512315 0.85714286 1.
0.96490128 0.89342711 0.93094934 0.85933785]
mean value: 0.9086997439278497
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.94736842 0.9122807 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.92857143]
mean value: 0.9539473684210527
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.94736842 0.9122807 0.92857143 1.
0.98245614 0.94736842 0.96296296 0.93103448]
mean value: 0.9541229161374352
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96428571 0.92857143 0.92857143 1.
0.96551724 0.93103448 1. 0.9 ]
mean value: 0.9549014778325123
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.9541871921182266
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.9476601 0.91256158 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.92857143]
mean value: 0.9540024630541872
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.9 0.83870968 0.86666667 1.
0.96551724 0.9 0.92857143 0.87096774]
mean value: 0.9134718470257959
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.93197846 0.98866272 1.00972748 0.9597013 0.97821736 0.96020889
0.97179317 1.03200746 0.92861557 1.0140574 ]
mean value: 0.9774969816207886
key: score_time
value: [0.16036201 0.25168538 0.22374582 0.18213248 0.22768426 0.23600006
0.21665478 0.25347424 0.24868369 0.23307705]
mean value: 0.2233499765396118
key: test_mcc
value: [0.96547546 0.86189955 0.8953202 0.82512315 0.85714286 0.93094934
0.92857143 0.85714286 0.93094934 0.89342711]
mean value: 0.8946001279474248
key: train_mcc
value: [0.94477296 0.96055211 0.94872473 0.96055211 0.9606597 0.95675965
0.94882625 0.95670033 0.94882625 0.95675965]
mean value: 0.9543133754242162
key: test_accuracy
value: [0.98245614 0.92982456 0.94736842 0.9122807 0.92857143 0.96428571
0.96428571 0.92857143 0.96428571 0.94642857]
mean value: 0.9468358395989975
key: train_accuracy
value: [0.97238659 0.98027613 0.97435897 0.98027613 0.98031496 0.97834646
0.97440945 0.97834646 0.97440945 0.97834646]
mean value: 0.9771471058721211
key: test_fscore
value: [0.98181818 0.93103448 0.94736842 0.9122807 0.92857143 0.96551724
0.96428571 0.92857143 0.96296296 0.94736842]
mean value: 0.9469778984207297
key: train_fscore
value: [0.97244094 0.98031496 0.97425743 0.98023715 0.98039216 0.97847358
0.97445972 0.97830375 0.97445972 0.97847358]
mean value: 0.9771813002130227
key: test_precision
value: [1. 0.9 0.96428571 0.92857143 0.92857143 0.93333333
0.96428571 0.92857143 1. 0.93103448]
mean value: 0.9478653530377669
key: train_precision
value: [0.97244094 0.98031496 0.97619048 0.98023715 0.9765625 0.97276265
0.97254902 0.98023715 0.97254902 0.97276265]
mean value: 0.9756606521047162
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 1.
0.96428571 0.92857143 0.92857143 0.96428571]
mean value: 0.9470443349753694
key: train_recall
value: [0.97244094 0.98031496 0.97233202 0.98023715 0.98425197 0.98425197
0.97637795 0.97637795 0.97637795 0.98425197]
mean value: 0.9787214839251813
key: test_roc_auc
value: [0.98214286 0.93041872 0.9476601 0.91256158 0.92857143 0.96428571
0.96428571 0.92857143 0.96428571 0.94642857]
mean value: 0.9469211822660099
key: train_roc_auc
value: [0.97238648 0.98027606 0.97435498 0.98027606 0.98031496 0.97834646
0.97440945 0.97834646 0.97440945 0.97834646]
mean value: 0.9771466807755751
key: test_jcc
value: [0.96428571 0.87096774 0.9 0.83870968 0.86666667 0.93333333
0.93103448 0.86666667 0.92857143 0.9 ]
mean value: 0.9000235711637269
key: train_jcc
value: [0.94636015 0.96138996 0.94980695 0.96124031 0.96153846 0.95785441
0.95019157 0.95752896 0.95019157 0.95785441]
mean value: 0.9553956747621544
MCC on Blind test: 0.25
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0273037 0.01028109 0.01023984 0.01031065 0.01022434 0.01032615
0.01025367 0.01148105 0.01173735 0.01176667]
mean value: 0.012392449378967284
key: score_time
value: [0.00985813 0.00880551 0.00889325 0.00906706 0.00880718 0.00885367
0.009377 0.00965214 0.0097661 0.00972247]
mean value: 0.009280252456665038
key: test_mcc
value: [0.8953202 0.7589669 0.68472906 0.58562417 0.64450339 0.75047877
0.64285714 0.71428571 0.53881591 0.75047877]
mean value: 0.6966060026275183
key: train_mcc
value: [0.74761876 0.73570695 0.73177298 0.73972796 0.74414639 0.74805469
0.75211424 0.7480315 0.74812427 0.7486119 ]
mean value: 0.7443909629951391
key: test_accuracy
value: [0.94736842 0.87719298 0.84210526 0.78947368 0.82142857 0.875
0.82142857 0.85714286 0.76785714 0.875 ]
mean value: 0.8473997493734335
key: train_accuracy
value: [0.87376726 0.8678501 0.86587771 0.86982249 0.87204724 0.87401575
0.87598425 0.87401575 0.87401575 0.87401575]
mean value: 0.8721412042429607
key: test_fscore
value: [0.94736842 0.88135593 0.84210526 0.77777778 0.82758621 0.87719298
0.82142857 0.85714286 0.77966102 0.87719298]
mean value: 0.8488812011521107
key: train_fscore
value: [0.875 0.8678501 0.86507937 0.8685259 0.87128713 0.87351779
0.87475149 0.87401575 0.87301587 0.87644788]
mean value: 0.8719491263936097
key: test_precision
value: [0.93103448 0.83870968 0.85714286 0.84 0.8 0.86206897
0.82142857 0.85714286 0.74193548 0.86206897]
mean value: 0.8411531860797712
key: train_precision
value: [0.86821705 0.86956522 0.8685259 0.87550201 0.87649402 0.87698413
0.88353414 0.87401575 0.88 0.85984848]
mean value: 0.8732686696416017
key: test_recall
value: [0.96428571 0.92857143 0.82758621 0.72413793 0.85714286 0.89285714
0.82142857 0.85714286 0.82142857 0.89285714]
mean value: 0.858743842364532
key: train_recall
value: [0.88188976 0.86614173 0.86166008 0.86166008 0.86614173 0.87007874
0.86614173 0.87401575 0.86614173 0.89370079]
mean value: 0.8707572126606704
key: test_roc_auc
value: [0.9476601 0.87807882 0.84236453 0.79064039 0.82142857 0.875
0.82142857 0.85714286 0.76785714 0.875 ]
mean value: 0.8476600985221675
key: train_roc_auc
value: [0.87375121 0.86785347 0.86586941 0.86980642 0.87204724 0.87401575
0.87598425 0.87401575 0.87401575 0.87401575]
mean value: 0.8721374996109676
key: test_jcc
value: [0.9 0.78787879 0.72727273 0.63636364 0.70588235 0.78125
0.6969697 0.75 0.63888889 0.78125 ]
mean value: 0.7405756090314914
key: train_jcc
value: [0.77777778 0.76655052 0.76223776 0.76760563 0.77192982 0.7754386
0.77738516 0.77622378 0.77464789 0.78006873]
mean value: 0.7729865668599729
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08733106 0.0808742 0.07191372 0.07638454 0.07736945 0.07642531
0.08206582 0.07636857 0.07126498 0.07357764]
mean value: 0.07735753059387207
key: score_time
value: [0.01076412 0.01088524 0.01092649 0.01091504 0.01091599 0.01097631
0.01111364 0.0110476 0.01095295 0.01090598]
mean value: 0.010940337181091308
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.93202124 0.85714286 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.9297611866340357
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.96491228 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9645363408521304
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.96666667 0.92857143 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9647015178140406
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96551724 0.93548387 0.92857143 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9621444462100747
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.9679802955665024
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.96428571 0.92857143 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9644704433497537
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.93548387 0.86666667 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9324892737962815
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.34
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.044734 0.06494427 0.04393315 0.06427455 0.0506022 0.07494164
0.09088469 0.05393553 0.10816693 0.06829667]
mean value: 0.06647136211395263
key: score_time
value: [0.02533603 0.01248574 0.01268339 0.01244736 0.0195601 0.02371478
0.01961422 0.01231694 0.04197431 0.01899767]
mean value: 0.01991305351257324
key: test_mcc
value: [0.82942474 0.86189955 0.75492611 0.79110556 0.75047877 0.85933785
0.67900461 0.78571429 0.75047877 0.85933785]
mean value: 0.7921708093166815
key: train_mcc
value: [0.89366043 0.88593277 0.88566582 0.90532508 0.91750062 0.89774912
0.89376313 0.90553988 0.90962508 0.89001213]
mean value: 0.8984774052532084
key: test_accuracy
value: [0.9122807 0.92982456 0.87719298 0.89473684 0.875 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.8953320802005013
key: train_accuracy
value: [0.94674556 0.94280079 0.94280079 0.95266272 0.95866142 0.9488189
0.94685039 0.95275591 0.95472441 0.94488189]
mean value: 0.9491702775318765
key: test_fscore
value: [0.91525424 0.93103448 0.87719298 0.9 0.87719298 0.92592593
0.84210526 0.89285714 0.87719298 0.92592593]
mean value: 0.8964681925282066
key: train_fscore
value: [0.94736842 0.94368932 0.94302554 0.95256917 0.95906433 0.94921875
0.94716243 0.95294118 0.95516569 0.94552529]
mean value: 0.9495730116083545
key: test_precision
value: [0.87096774 0.9 0.89285714 0.87096774 0.86206897 0.96153846
0.82758621 0.89285714 0.86206897 0.96153846]
mean value: 0.8902450830593212
key: train_precision
value: [0.93822394 0.93103448 0.9375 0.95256917 0.94980695 0.94186047
0.94163424 0.94921875 0.94594595 0.93461538]
mean value: 0.9422409327672729
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714
0.85714286 0.89285714 0.89285714 0.89285714]
mean value: 0.9043103448275862
key: train_recall
value: [0.95669291 0.95669291 0.9486166 0.95256917 0.96850394 0.95669291
0.95275591 0.95669291 0.96456693 0.95669291]
mean value: 0.9570477109333665
key: test_roc_auc
value: [0.91317734 0.93041872 0.87746305 0.89408867 0.875 0.92857143
0.83928571 0.89285714 0.875 0.92857143]
mean value: 0.8954433497536947
key: train_roc_auc
value: [0.9467259 0.94277333 0.94281224 0.95266254 0.95866142 0.9488189
0.94685039 0.95275591 0.95472441 0.94488189]
mean value: 0.9491666926021599
key: test_jcc
value: [0.84375 0.87096774 0.78125 0.81818182 0.78125 0.86206897
0.72727273 0.80645161 0.78125 0.86206897]
mean value: 0.8134511831327738
key: train_jcc
value: [0.9 0.89338235 0.89219331 0.90943396 0.92134831 0.90334572
0.89962825 0.91011236 0.9141791 0.89667897]
mean value: 0.9040302346875264
MCC on Blind test: 0.21
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01414633 0.01014614 0.00987864 0.00988293 0.00992298 0.00999713
0.00989652 0.01112843 0.01074934 0.01004362]
mean value: 0.010579204559326172
key: score_time
value: [0.00973105 0.00881386 0.00866055 0.00865984 0.00862098 0.0086832
0.0088861 0.00939202 0.0096097 0.00943041]
mean value: 0.009048771858215333
key: test_mcc
value: [0.8953202 0.79161589 0.82512315 0.72133224 0.5728919 0.71428571
0.71611487 0.60753044 0.60753044 0.82195294]
mean value: 0.7273697782685168
key: train_mcc
value: [0.75148224 0.73599419 0.72785421 0.71616261 0.74805469 0.77167747
0.76772249 0.76772249 0.73248786 0.75592895]
mean value: 0.7475087174015772
key: test_accuracy
value: [0.94736842 0.89473684 0.9122807 0.85964912 0.78571429 0.85714286
0.85714286 0.80357143 0.80357143 0.91071429]
mean value: 0.8631892230576441
key: train_accuracy
value: [0.87573964 0.8678501 0.86390533 0.85798817 0.87401575 0.88582677
0.88385827 0.88385827 0.86614173 0.87795276]
mean value: 0.8737136778021091
key: test_fscore
value: [0.94736842 0.89655172 0.9122807 0.85714286 0.77777778 0.85714286
0.85185185 0.8 0.80701754 0.90909091]
mean value: 0.8616224643810851
key: train_fscore
value: [0.8762279 0.86626747 0.86282306 0.856 0.87351779 0.88537549
0.88408644 0.88408644 0.86454183 0.87843137]
mean value: 0.8731357798405449
key: test_precision
value: [0.93103448 0.86666667 0.92857143 0.88888889 0.80769231 0.85714286
0.88461538 0.81481481 0.79310345 0.92592593]
mean value: 0.8698456205352757
key: train_precision
value: [0.8745098 0.87854251 0.868 0.86639676 0.87698413 0.88888889
0.88235294 0.88235294 0.875 0.875 ]
mean value: 0.8768027973402586
key: test_recall
value: [0.96428571 0.92857143 0.89655172 0.82758621 0.75 0.85714286
0.82142857 0.78571429 0.82142857 0.89285714]
mean value: 0.8545566502463054
key: train_recall
value: [0.87795276 0.85433071 0.85770751 0.8458498 0.87007874 0.88188976
0.88582677 0.88582677 0.85433071 0.88188976]
mean value: 0.8695683296504932
key: test_roc_auc
value: [0.9476601 0.8953202 0.91256158 0.86022167 0.78571429 0.85714286
0.85714286 0.80357143 0.80357143 0.91071429]
mean value: 0.8633620689655173
key: train_roc_auc
value: [0.87573527 0.86787682 0.86389313 0.85796427 0.87401575 0.88582677
0.88385827 0.88385827 0.86614173 0.87795276]
mean value: 0.8737123027605739
key: test_jcc
value: [0.9 0.8125 0.83870968 0.75 0.63636364 0.75
0.74193548 0.66666667 0.67647059 0.83333333]
mean value: 0.7605979385889253
key: train_jcc
value: [0.77972028 0.76408451 0.75874126 0.74825175 0.7754386 0.79432624
0.79225352 0.79225352 0.76140351 0.78321678]
mean value: 0.7749689965623754
MCC on Blind test: 0.3
Accuracy on Blind test: 0.73
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01739788 0.0220623 0.01760864 0.02229691 0.01804233 0.0232923
0.02561951 0.01866484 0.0224731 0.01789403]
mean value: 0.02053518295288086
key: score_time
value: [0.01050401 0.01128793 0.01169872 0.01175117 0.01168776 0.0117321
0.0117178 0.01158166 0.01171303 0.01155519]
mean value: 0.011522936820983886
key: test_mcc
value: [0.82880708 0.70453109 0.89952865 0.7257422 0.72168784 0.89342711
0.72168784 0.78571429 0.64951905 0.89342711]
mean value: 0.7824072264760507
key: train_mcc
value: [0.80155032 0.8649269 0.77915876 0.88999604 0.84134934 0.86746041
0.90163769 0.86616858 0.88257403 0.85049917]
mean value: 0.8545321231378529
key: test_accuracy
value: [0.9122807 0.84210526 0.94736842 0.85964912 0.85714286 0.94642857
0.85714286 0.89285714 0.82142857 0.94642857]
mean value: 0.8882832080200501
key: train_accuracy
value: [0.89349112 0.93096647 0.88560158 0.94477318 0.91732283 0.93307087
0.9507874 0.93307087 0.94094488 0.92519685]
mean value: 0.9255226047927441
key: test_fscore
value: [0.90566038 0.81632653 0.95081967 0.87096774 0.84615385 0.94736842
0.86666667 0.89285714 0.83333333 0.94545455]
mean value: 0.8875608277555532
key: train_fscore
value: [0.8826087 0.92813142 0.89298893 0.94552529 0.91176471 0.9348659
0.95107632 0.93280632 0.94208494 0.92578125]
mean value: 0.9247633777608493
key: test_precision
value: [0.96 0.95238095 0.90625 0.81818182 0.91666667 0.93103448
0.8125 0.89285714 0.78125 0.96296296]
mean value: 0.8934084025808163
key: train_precision
value: [0.98543689 0.96995708 0.83737024 0.93103448 0.97747748 0.91044776
0.94552529 0.93650794 0.92424242 0.91860465]
mean value: 0.9336604242135553
key: test_recall
value: [0.85714286 0.71428571 1. 0.93103448 0.78571429 0.96428571
0.92857143 0.89285714 0.89285714 0.92857143]
mean value: 0.8895320197044335
key: train_recall
value: [0.7992126 0.88976378 0.95652174 0.96047431 0.85433071 0.96062992
0.95669291 0.92913386 0.96062992 0.93307087]
mean value: 0.9200460614359964
key: test_roc_auc
value: [0.91133005 0.83990148 0.94642857 0.85837438 0.85714286 0.94642857
0.85714286 0.89285714 0.82142857 0.94642857]
mean value: 0.8877463054187192
key: train_roc_auc
value: [0.89367745 0.9310479 0.88574118 0.94480408 0.91732283 0.93307087
0.9507874 0.93307087 0.94094488 0.92519685]
mean value: 0.925566431172388
key: test_jcc
value: [0.82758621 0.68965517 0.90625 0.77142857 0.73333333 0.9
0.76470588 0.80645161 0.71428571 0.89655172]
mean value: 0.8010248217752062
key: train_jcc
value: [0.78988327 0.86590038 0.80666667 0.89667897 0.83783784 0.87769784
0.90671642 0.87407407 0.89051095 0.86181818]
mean value: 0.8607784587352857
MCC on Blind test: 0.26
Accuracy on Blind test: 0.63
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02330208 0.02178788 0.02393198 0.02130818 0.02126288 0.02093101
0.02174282 0.02573943 0.02018857 0.0217464 ]
mean value: 0.02219412326812744
key: score_time
value: [0.01169729 0.01158571 0.01161098 0.02670074 0.01293182 0.0131402
0.01216984 0.01427531 0.01238012 0.01276994]
mean value: 0.013926196098327636
key: test_mcc
value: [0.86189955 0.72064772 0.79161589 0.7257422 0.5242106 0.89802651
0.62705445 0.78571429 0.71428571 0.85714286]
mean value: 0.7506339773602754
key: train_mcc
value: [0.85755494 0.88979006 0.88714258 0.88343206 0.68829839 0.82903225
0.88411257 0.92959505 0.89020543 0.87910492]
mean value: 0.8618268250222627
key: test_accuracy
value: [0.92982456 0.85964912 0.89473684 0.85964912 0.73214286 0.94642857
0.80357143 0.89285714 0.85714286 0.92857143]
mean value: 0.8704573934837093
key: train_accuracy
value: [0.9270217 0.94477318 0.94280079 0.9408284 0.82283465 0.91141732
0.94094488 0.96456693 0.94488189 0.93897638]
mean value: 0.9279046110360465
key: test_fscore
value: [0.93103448 0.85185185 0.89285714 0.87096774 0.65116279 0.94915254
0.82539683 0.89285714 0.85714286 0.92857143]
mean value: 0.865099480644191
key: train_fscore
value: [0.93032015 0.94552529 0.94093686 0.94252874 0.78571429 0.91651206
0.94296578 0.964 0.944 0.94049904]
mean value: 0.9253002206522171
key: test_precision
value: [0.9 0.88461538 0.92592593 0.81818182 0.93333333 0.90322581
0.74285714 0.89285714 0.85714286 0.92857143]
mean value: 0.8786710839936647
key: train_precision
value: [0.89169675 0.93461538 0.97058824 0.91449814 0.9939759 0.86666667
0.91176471 0.9796748 0.95934959 0.917603 ]
mean value: 0.9340433174738031
key: test_recall
value: [0.96428571 0.82142857 0.86206897 0.93103448 0.5 1.
0.92857143 0.89285714 0.85714286 0.92857143]
mean value: 0.8685960591133005
key: train_recall
value: [0.97244094 0.95669291 0.91304348 0.97233202 0.6496063 0.97244094
0.97637795 0.9488189 0.92913386 0.96456693]
mean value: 0.9255454234228626
key: test_roc_auc
value: [0.93041872 0.85899015 0.8953202 0.85837438 0.73214286 0.94642857
0.80357143 0.89285714 0.85714286 0.92857143]
mean value: 0.8703817733990148
key: train_roc_auc
value: [0.92693193 0.94474962 0.94274221 0.94089042 0.82283465 0.91141732
0.94094488 0.96456693 0.94488189 0.93897638]
mean value: 0.9278936229809218
key: test_jcc
value: [0.87096774 0.74193548 0.80645161 0.77142857 0.48275862 0.90322581
0.7027027 0.80645161 0.75 0.86666667]
mean value: 0.7702588819552112
key: train_jcc
value: [0.86971831 0.89667897 0.88846154 0.89130435 0.64705882 0.84589041
0.89208633 0.93050193 0.89393939 0.88768116]
mean value: 0.864332121222163
MCC on Blind test: 0.1
Accuracy on Blind test: 0.21
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20345497 0.19475746 0.18883538 0.18976021 0.18943191 0.18772388
0.19340324 0.19115758 0.19200993 0.20393801]
mean value: 0.19344725608825683
key: score_time
value: [0.01576757 0.01642036 0.01555204 0.01572323 0.01514602 0.01556444
0.01558781 0.01524043 0.01602459 0.01608181]
mean value: 0.015710830688476562
key: test_mcc
value: [0.96547546 0.8951918 0.96547546 0.93202124 0.82195294 0.96490128
0.96490128 0.89342711 1. 0.92857143]
mean value: 0.9331918004083978
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.98245614 0.96491228 0.91071429 0.98214286
0.98214286 0.94642857 1. 0.96428571]
mean value: 0.9662907268170425
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94545455 0.98305085 0.96666667 0.9122807 0.98181818
0.98245614 0.94736842 1. 0.96428571]
mean value: 0.9665199400658812
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.96666667 0.93548387 0.89655172 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9622502663158948
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 1. 1. 0.92857143 0.96428571
1. 0.96428571 1. 0.96428571]
mean value: 0.9714285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.94704433 0.98214286 0.96428571 0.91071429 0.98214286
0.98214286 0.94642857 1. 0.96428571]
mean value: 0.9661330049261084
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.89655172 0.96666667 0.93548387 0.83870968 0.96428571
0.96551724 0.9 1. 0.93103448]
mean value: 0.9362535091901054
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.39
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06323409 0.08535194 0.07566977 0.07401085 0.06657791 0.08859229
0.07154846 0.0926826 0.07722116 0.06817889]
mean value: 0.07630679607391358
key: score_time
value: [0.01864672 0.04004669 0.0241456 0.03599811 0.02662635 0.0338707
0.04093671 0.02519202 0.03072882 0.02305079]
mean value: 0.02992424964904785
key: test_mcc
value: [0.96547546 0.8953202 0.8953202 0.93202124 0.82618439 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.9232170639899975
key: train_mcc
value: [0.99214142 0.99211042 0.98823457 1. 0.98819663 0.98819663
0.98428248 1. 0.99212598 0.98819663]
mean value: 0.9913484783283143
key: test_accuracy
value: [0.98245614 0.94736842 0.94736842 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [0.99605523 0.99605523 0.99408284 1. 0.99409449 0.99409449
0.99212598 1. 0.99606299 0.99409449]
mean value: 0.9956665734830483
key: test_fscore
value: [0.98181818 0.94736842 0.94736842 0.96666667 0.91525424 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9615549166530434
key: train_fscore
value: [0.99604743 0.99606299 0.99403579 1. 0.99408284 0.99408284
0.99209486 1. 0.99606299 0.99410609]
mean value: 0.9956575832877012
key: test_precision
value: [1. 0.93103448 0.96428571 0.93548387 0.87096774 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9562609248371206
key: train_precision
value: [1. 0.99606299 1. 1. 0.99604743 0.99604743
0.99603175 1. 0.99606299 0.99215686]
mean value: 0.9972409454688892
key: test_recall
value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.968103448275862
key: train_recall
value: [0.99212598 0.99606299 0.98814229 1. 0.99212598 0.99212598
0.98818898 1. 0.99606299 0.99606299]
mean value: 0.994089819800193
key: test_roc_auc
value: [0.98214286 0.9476601 0.9476601 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.99606299 0.99605521 0.99407115 1. 0.99409449 0.99409449
0.99212598 1. 0.99606299 0.99409449]
mean value: 0.9956661790793937
key: test_jcc
value: [0.96428571 0.9 0.9 0.93548387 0.84375 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9268642737962816
key: train_jcc
value: [0.99212598 0.99215686 0.98814229 1. 0.98823529 0.98823529
0.98431373 1. 0.99215686 0.98828125]
mean value: 0.9913647565957774
MCC on Blind test: 0.09
Accuracy on Blind test: 0.32
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16241002 0.13786769 0.23004556 0.15227127 0.16346526 0.1342628
0.15097213 0.17534328 0.17445135 0.1752553 ]
mean value: 0.16563446521759034
key: score_time
value: [0.02536011 0.02174473 0.02583504 0.01519656 0.02511072 0.01517773
0.02864337 0.02516484 0.02513885 0.02539349]
mean value: 0.023276543617248534
key: test_mcc
value: [0.85960591 0.54592083 0.65104858 0.61405719 0.3992747 0.71428571
0.53605627 0.60753044 0.64285714 0.85714286]
mean value: 0.6427779638491756
key: train_mcc
value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98428248 0.98437404
0.98437404 0.98437404 0.98428248 0.98437404]
mean value: 0.9851217587100868
key: test_accuracy
value: [0.92982456 0.77192982 0.8245614 0.80701754 0.69642857 0.85714286
0.76785714 0.80357143 0.82142857 0.92857143]
mean value: 0.8208333333333333
key: train_accuracy
value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99212598 0.99212598
0.99212598 0.99212598 0.99212598 0.99212598]
mean value: 0.9925142493283015
key: test_fscore
value: [0.92857143 0.75471698 0.82142857 0.81355932 0.66666667 0.85714286
0.76363636 0.8 0.82142857 0.92857143]
mean value: 0.8155722190611862
key: train_fscore
value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99209486 0.99206349
0.99206349 0.99206349 0.99209486 0.99206349]
mean value: 0.9924634247376443
key: test_precision
value: [0.92857143 0.8 0.85185185 0.8 0.73913043 0.85714286
0.77777778 0.81481481 0.82142857 0.92857143]
mean value: 0.8319289164941339
key: train_precision
value: [1. 1. 1. 1. 0.99603175 1.
1. 1. 0.99603175 1. ]
mean value: 0.9992063492063492
key: test_recall
value: [0.92857143 0.71428571 0.79310345 0.82758621 0.60714286 0.85714286
0.75 0.78571429 0.82142857 0.92857143]
mean value: 0.8013546798029556
key: train_recall
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197
0.98425197 0.98425197 0.98818898 0.98425197]
mean value: 0.985815878746382
key: test_roc_auc
value: [0.92980296 0.77093596 0.82512315 0.80665025 0.69642857 0.85714286
0.76785714 0.80357143 0.82142857 0.92857143]
mean value: 0.8207512315270936
key: train_roc_auc
value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99212598 0.99212598
0.99212598 0.99212598 0.99212598 0.99212598]
mean value: 0.9925142385857895
key: test_jcc
value: [0.86666667 0.60606061 0.6969697 0.68571429 0.5 0.75
0.61764706 0.66666667 0.6969697 0.86666667]
mean value: 0.6953361344537815
key: train_jcc
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98431373 0.98425197
0.98425197 0.98425197 0.98431373 0.98425197]
mean value: 0.9850408285688307
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.7869184 0.77875566 0.78472948 0.7729466 0.76530933 0.76920581
0.77312517 0.7658267 0.76497579 0.7665813 ]
mean value: 0.7728374242782593
key: score_time
value: [0.00956082 0.010185 0.01012588 0.00918317 0.00925159 0.00918603
0.00912333 0.00915837 0.00957084 0.00959945]
mean value: 0.009494447708129882
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1.
0.92857143 0.89342711 0.96490128 0.92857143]
mean value: 0.9294526803513998
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
0.96428571 0.94642857 0.98214286 0.96428571]
mean value: 0.9645050125313284
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807 1.
0.96428571 0.94736842 0.98181818 0.96428571]
mean value: 0.9648326537346501
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.93548387 0.89655172 1.
0.96428571 0.93103448 1. 0.96428571]
mean value: 0.9621444462100747
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1.
0.96428571 0.96428571 0.96428571 0.96428571]
mean value: 0.9679802955665024
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
0.96428571 0.94642857 0.98214286 0.96428571]
mean value: 0.964408866995074
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1.
0.93103448 0.9 0.96428571 0.93103448]
mean value: 0.9329201758567721
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.29
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03118086 0.03124666 0.03096771 0.03132343 0.03108811 0.03085017
0.03105617 0.03075123 0.03121066 0.03153753]
mean value: 0.031121253967285156
key: score_time
value: [0.01248264 0.01258159 0.01248908 0.01303363 0.01314592 0.01312351
0.01312232 0.01303387 0.01307392 0.0132153 ]
mean value: 0.012930178642272949
key: test_mcc
value: [ 0.30265542 -0.06746787 0.35337918 0.15195767 -0.06262243 0.05399492
0.11547005 0.05399492 0.57735027 0.18650096]
mean value: 0.16652131068133877
key: train_mcc
value: [0.56403512 0.41093503 0.5405667 0.41258679 0.36596253 0.32302914
0.31554255 0.4796084 0.91257312 0.33769082]
mean value: 0.4662530185948502
key: test_accuracy
value: [0.61403509 0.47368421 0.64912281 0.56140351 0.48214286 0.51785714
0.53571429 0.51785714 0.78571429 0.57142857]
mean value: 0.5708959899749373
key: train_accuracy
value: [0.74161736 0.64497041 0.72583826 0.64497041 0.61811024 0.59448819
0.59055118 0.68700787 0.95472441 0.6023622 ]
mean value: 0.6804640544192331
key: test_fscore
value: [0.7027027 0.625 0.72972973 0.67532468 0.63291139 0.64935065
0.66666667 0.64935065 0.8 0.67567568]
mean value: 0.6806712141205812
key: train_fscore
value: [0.79499218 0.73837209 0.78449612 0.73760933 0.72364672 0.71148459
0.70949721 0.76161919 0.95652174 0.71549296]
mean value: 0.7633732133244073
key: test_precision
value: [0.56521739 0.48076923 0.6 0.54166667 0.49019608 0.51020408
0.52 0.51020408 0.75 0.54347826]
mean value: 0.5511735791306489
key: train_precision
value: [0.65974026 0.58525346 0.64540816 0.58429561 0.56696429 0.55217391
0.54978355 0.61501211 0.92 0.55701754]
mean value: 0.6235648890174496
key: test_recall
value: [0.92857143 0.89285714 0.93103448 0.89655172 0.89285714 0.89285714
0.92857143 0.89285714 0.85714286 0.89285714]
mean value: 0.9006157635467981
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99606299 1. ]
mean value: 0.9996062992125985
key: test_roc_auc
value: [0.61945813 0.48091133 0.64408867 0.55541872 0.48214286 0.51785714
0.53571429 0.51785714 0.78571429 0.57142857]
mean value: 0.5710591133004926
key: train_roc_auc
value: [0.74110672 0.64426877 0.72637795 0.64566929 0.61811024 0.59448819
0.59055118 0.68700787 0.95472441 0.6023622 ]
mean value: 0.6804666832653824
key: test_jcc
value: [0.54166667 0.45454545 0.57446809 0.50980392 0.46296296 0.48076923
0.5 0.48076923 0.66666667 0.51020408]
mean value: 0.5181856300687876
key: train_jcc
value: [0.65974026 0.58525346 0.64540816 0.58429561 0.56696429 0.55217391
0.54978355 0.61501211 0.91666667 0.55701754]
mean value: 0.6232315556841161
MCC on Blind test: -0.06
Accuracy on Blind test: 0.18
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02687025 0.03934693 0.03566217 0.03851652 0.04220271 0.03784752
0.03835869 0.03784919 0.03812003 0.03818917]
mean value: 0.03729631900787354
key: score_time
value: [0.01908278 0.01851916 0.02274084 0.01836586 0.01849699 0.01842642
0.01837707 0.01831055 0.01833153 0.01832366]
mean value: 0.018897485733032227
key: test_mcc
value: [0.92980296 0.8953202 0.82512315 0.82490815 0.75434227 0.82195294
0.67900461 0.78571429 0.71611487 0.82618439]
mean value: 0.8058467820455303
key: train_mcc
value: [0.86198955 0.86998617 0.86194018 0.86999628 0.88599845 0.85850727
0.86624915 0.878014 0.87040934 0.86237183]
mean value: 0.8685462219654635
key: test_accuracy
value: [0.96491228 0.94736842 0.9122807 0.9122807 0.875 0.91071429
0.83928571 0.89285714 0.85714286 0.91071429]
mean value: 0.9022556390977443
key: train_accuracy
value: [0.93096647 0.93491124 0.93096647 0.93491124 0.94291339 0.92913386
0.93307087 0.93897638 0.93503937 0.93110236]
mean value: 0.9341991644535558
key: test_fscore
value: [0.96428571 0.94736842 0.9122807 0.91525424 0.88135593 0.90909091
0.84210526 0.89285714 0.86206897 0.90566038]
mean value: 0.9032327664565936
key: train_fscore
value: [0.93150685 0.93567251 0.93096647 0.93542074 0.94346979 0.92996109
0.93359375 0.93933464 0.93592233 0.93177388]
mean value: 0.9347622049276256
key: test_precision
value: [0.96428571 0.93103448 0.92857143 0.9 0.83870968 0.92592593
0.82758621 0.89285714 0.83333333 0.96 ]
mean value: 0.9002303912048072
key: train_precision
value: [0.92607004 0.92664093 0.92913386 0.92635659 0.93436293 0.91923077
0.92635659 0.93385214 0.92337165 0.92277992]
mean value: 0.9268155416074748
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714
0.85714286 0.89285714 0.89285714 0.85714286]
mean value: 0.9077586206896552
key: train_recall
value: [0.93700787 0.94488189 0.93280632 0.94466403 0.95275591 0.94094488
0.94094488 0.94488189 0.9488189 0.94094488]
mean value: 0.942865145809343
key: test_roc_auc
value: [0.96490148 0.9476601 0.91256158 0.91194581 0.875 0.91071429
0.83928571 0.89285714 0.85714286 0.91071429]
mean value: 0.9022783251231528
key: train_roc_auc
value: [0.93095453 0.93489154 0.93097009 0.93493044 0.94291339 0.92913386
0.93307087 0.93897638 0.93503937 0.93110236]
mean value: 0.9341982820329278
key: test_jcc
value: [0.93103448 0.9 0.83870968 0.84375 0.78787879 0.83333333
0.72727273 0.80645161 0.75757576 0.82758621]
mean value: 0.8253592586038359
key: train_jcc
value: [0.87179487 0.87912088 0.87084871 0.87867647 0.89298893 0.86909091
0.87545788 0.88560886 0.87956204 0.87226277]
mean value: 0.8775412318035963
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27225637 0.29213047 0.28225875 0.29532313 0.28092909 0.27798748
0.30030775 0.27729392 0.29576445 0.31901908]
mean value: 0.2893270492553711
key: score_time
value: [0.01860166 0.01854444 0.01844335 0.01861453 0.01857495 0.01867414
0.0185442 0.01859784 0.01850152 0.01851773]
mean value: 0.018561434745788575
key: test_mcc
value: [0.92980296 0.8953202 0.85960591 0.82490815 0.75434227 0.82195294
0.67900461 0.78571429 0.71611487 0.82618439]
mean value: 0.8092950579075993
key: train_mcc
value: [0.86198955 0.86998617 0.88168563 0.86999628 0.90174953 0.85850727
0.86624915 0.878014 0.87040934 0.86237183]
mean value: 0.8720958746760563
key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.9122807 0.875 0.91071429
0.83928571 0.89285714 0.85714286 0.91071429]
mean value: 0.9040100250626566
key: train_accuracy
value: [0.93096647 0.93491124 0.9408284 0.93491124 0.9507874 0.92913386
0.93307087 0.93897638 0.93503937 0.93110236]
mean value: 0.9359727593222444
key: test_fscore
value: [0.96428571 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
0.84210526 0.89285714 0.86206897 0.90566038]
mean value: 0.905108144557017
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93150685 0.93567251 0.94094488 0.93542074 0.95126706 0.92996109
0.93359375 0.93933464 0.93592233 0.93177388]
mean value: 0.9365397732693178
key: test_precision
value: [0.96428571 0.93103448 0.93103448 0.9 0.83870968 0.92592593
0.82758621 0.89285714 0.83333333 0.96 ]
mean value: 0.9004766966235265
key: train_precision
value: [0.92607004 0.92664093 0.9372549 0.92635659 0.94208494 0.91923077
0.92635659 0.93385214 0.92337165 0.92277992]
mean value: 0.9283998467489825
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.85714286 0.89285714 0.89285714 0.85714286]
mean value: 0.9112068965517242
key: train_recall
value: [0.93700787 0.94488189 0.94466403 0.94466403 0.96062992 0.94094488
0.94094488 0.94488189 0.9488189 0.94094488]
mean value: 0.9448383181351343
key: test_roc_auc
value: [0.96490148 0.9476601 0.92980296 0.91194581 0.875 0.91071429
0.83928571 0.89285714 0.85714286 0.91071429]
mean value: 0.9040024630541872
key: train_roc_auc
value: [0.93095453 0.93489154 0.94083595 0.93493044 0.9507874 0.92913386
0.93307087 0.93897638 0.93503937 0.93110236]
mean value: 0.9359722697706265
key: test_jcc
value: [0.93103448 0.9 0.87096774 0.84375 0.78787879 0.83333333
0.72727273 0.80645161 0.75757576 0.82758621]
mean value: 0.8285850650554488
key: train_jcc
value: [0.87179487 0.87912088 0.88847584 0.87867647 0.9070632 0.86909091
0.87545788 0.88560886 0.87956204 0.87226277]
mean value: 0.8807113713116829
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03605223 0.03439736 0.03540349 0.03663993 0.03655434 0.03590679
0.03478384 0.03613806 0.03570008 0.03664112]
mean value: 0.035821723937988284
key: score_time
value: [0.01276493 0.01178789 0.01285243 0.01279426 0.01268625 0.01275706
0.0128274 0.01273704 0.01279736 0.01278663]
mean value: 0.012679123878479004
key: test_mcc
value: [0.74935731 0.75033796 0.63745526 0.81878307 0.96423926 0.89139151
0.81878307 0.96423926 0.75724019 0.81878307]
mean value: 0.8170609946666495
key: train_mcc
value: [0.86672653 0.87081606 0.88732456 0.87070654 0.85465174 0.862886
0.86274286 0.85865182 0.87881806 0.86667428]
mean value: 0.8679998468826917
key: test_accuracy
value: [0.87272727 0.87272727 0.81818182 0.90909091 0.98181818 0.94545455
0.90909091 0.98181818 0.87272727 0.90909091]
mean value: 0.9072727272727272
key: train_accuracy
value: [0.93333333 0.93535354 0.94343434 0.93535354 0.92727273 0.93131313
0.93131313 0.92929293 0.93939394 0.93333333]
mean value: 0.933939393939394
key: test_fscore
value: [0.8627451 0.87719298 0.80769231 0.90909091 0.98113208 0.94736842
0.90909091 0.98245614 0.8852459 0.90909091]
mean value: 0.9071105653974942
key: train_fscore
value: [0.93386774 0.936 0.94444444 0.93548387 0.928 0.932
0.93172691 0.92957746 0.93951613 0.93333333]
mean value: 0.9343949885667974
key: test_precision
value: [0.91666667 0.83333333 0.84 0.89285714 1. 0.93103448
0.92592593 0.96551724 0.81818182 0.92592593]
mean value: 0.9049442537028745
key: train_precision
value: [0.92828685 0.92857143 0.9296875 0.93548387 0.92063492 0.92094862
0.92430279 0.924 0.93574297 0.93145161]
mean value: 0.927911056299992
key: test_recall
value: [0.81481481 0.92592593 0.77777778 0.92592593 0.96296296 0.96428571
0.89285714 1. 0.96428571 0.89285714]
mean value: 0.9121693121693122
key: train_recall
value: [0.93951613 0.94354839 0.95967742 0.93548387 0.93548387 0.94331984
0.93927126 0.93522267 0.94331984 0.93522267]
mean value: 0.9410065952722999
key: test_roc_auc
value: [0.87169312 0.87367725 0.81746032 0.90939153 0.98148148 0.94510582
0.90939153 0.98148148 0.87103175 0.90939153]
mean value: 0.907010582010582
key: train_roc_auc
value: [0.93332082 0.93533695 0.94340146 0.93535327 0.92725611 0.93133734
0.93132918 0.92930488 0.93940185 0.93333714]
mean value: 0.9339378999608202
key: test_jcc
value: [0.75862069 0.78125 0.67741935 0.83333333 0.96296296 0.9
0.83333333 0.96551724 0.79411765 0.83333333]
mean value: 0.8339887895894978
key: train_jcc
value: [0.87593985 0.87969925 0.89473684 0.87878788 0.86567164 0.87265918
0.87218045 0.86842105 0.88593156 0.875 ]
mean value: 0.876902769915327
MCC on Blind test: 0.28
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.83404994 0.86711574 0.86436319 0.86811352 0.81349635 0.76629972
0.91054893 0.83312821 0.85565472 0.7771697 ]
mean value: 0.8389940023422241
key: score_time
value: [0.01317811 0.01305819 0.01305723 0.01334643 0.01304197 0.01310301
0.01307392 0.01307011 0.0126493 0.0120368 ]
mean value: 0.012961506843566895
key: test_mcc
value: [0.74935731 0.79069197 0.67602163 0.81878307 0.92962225 0.74569602
0.85449735 0.96423926 0.75724019 0.78410665]
mean value: 0.8070255698567865
key: train_mcc
value: [0.90329065 0.89091503 0.91922028 0.89905801 0.94355919 0.95574863
0.898996 0.89091503 0.84242919 0.82225124]
mean value: 0.8966383251997698
key: test_accuracy
value: [0.87272727 0.89090909 0.83636364 0.90909091 0.96363636 0.87272727
0.92727273 0.98181818 0.87272727 0.89090909]
mean value: 0.9018181818181817
key: train_accuracy
value: [0.95151515 0.94545455 0.95959596 0.94949495 0.97171717 0.97777778
0.94949495 0.94545455 0.92121212 0.91111111]
mean value: 0.9482828282828283
key: test_fscore
value: [0.8627451 0.89655172 0.82352941 0.90909091 0.96153846 0.87719298
0.92857143 0.98245614 0.8852459 0.88888889]
mean value: 0.9015810946477902
key: train_fscore
value: [0.95219124 0.94567404 0.95983936 0.94929006 0.97154472 0.97750511
0.94929006 0.94523327 0.92089249 0.91129032]
mean value: 0.9482750669610251
key: test_precision
value: [0.91666667 0.83870968 0.875 0.89285714 1. 0.86206897
0.92857143 0.96551724 0.81818182 0.92307692]
mean value: 0.9020649863669886
key: train_precision
value: [0.94094488 0.9437751 0.956 0.95510204 0.9795082 0.98760331
0.95121951 0.94715447 0.92276423 0.90763052]
mean value: 0.9491702259084599
key: test_recall
value: [0.81481481 0.96296296 0.77777778 0.92592593 0.92592593 0.89285714
0.92857143 1. 0.96428571 0.85714286]
mean value: 0.9050264550264551
key: train_recall
value: [0.96370968 0.94758065 0.96370968 0.94354839 0.96370968 0.96761134
0.94736842 0.94331984 0.91902834 0.91497976]
mean value: 0.9474565756823822
key: test_roc_auc
value: [0.87169312 0.89219577 0.83531746 0.90939153 0.96296296 0.8723545
0.92724868 0.98148148 0.87103175 0.89153439]
mean value: 0.901521164021164
key: train_roc_auc
value: [0.95149047 0.94545024 0.95958763 0.94950699 0.97173338 0.97775728
0.94949066 0.94545024 0.92120772 0.91111891]
mean value: 0.9482793522267207
key: test_jcc
value: [0.75862069 0.8125 0.7 0.83333333 0.92592593 0.78125
0.86666667 0.96551724 0.79411765 0.8 ]
mean value: 0.8237931504019232
key: train_jcc
value: [0.90874525 0.89694656 0.92277992 0.9034749 0.94466403 0.956
0.9034749 0.89615385 0.85338346 0.83703704]
mean value: 0.9022659915221568
MCC on Blind test: 0.26
Accuracy on Blind test: 0.65
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01397133 0.01245117 0.01030517 0.00996041 0.01001978 0.00983381
0.00986528 0.00995183 0.00990844 0.01001573]
mean value: 0.010628294944763184
key: score_time
value: [0.01229095 0.0092237 0.00907922 0.00885463 0.00884295 0.00878549
0.00874043 0.00866818 0.00869727 0.00873613]
mean value: 0.00919189453125
key: test_mcc
value: [0.49137176 0.63624339 0.56841568 0.63745526 0.80032673 0.57068493
0.75878131 0.79069197 0.52777778 0.68504815]
mean value: 0.646679695350377
key: train_mcc
value: [0.69712309 0.67315631 0.74497319 0.66701918 0.66780236 0.6884516
0.6712994 0.65867563 0.6757794 0.66760924]
mean value: 0.6811889391928496
key: test_accuracy
value: [0.74545455 0.81818182 0.78181818 0.81818182 0.89090909 0.78181818
0.87272727 0.89090909 0.76363636 0.83636364]
mean value: 0.8200000000000001
key: train_accuracy
value: [0.84848485 0.83232323 0.87070707 0.83030303 0.83030303 0.84040404
0.83232323 0.82626263 0.83434343 0.83030303]
mean value: 0.8375757575757576
key: test_fscore
value: [0.73076923 0.81481481 0.76 0.80769231 0.875 0.76923077
0.8627451 0.88461538 0.76363636 0.82352941]
mean value: 0.8092033380562792
key: train_fscore
value: [0.84725051 0.81838074 0.86440678 0.81818182 0.8173913 0.82713348
0.81917211 0.81304348 0.8209607 0.81659389]
mean value: 0.8262514811253847
key: test_precision
value: [0.76 0.81481481 0.82608696 0.84 1. 0.83333333
0.95652174 0.95833333 0.77777778 0.91304348]
mean value: 0.8679911433172303
key: train_precision
value: [0.85596708 0.89473684 0.91071429 0.88317757 0.88679245 0.9
0.88679245 0.87793427 0.89099526 0.88625592]
mean value: 0.8873366138897277
key: test_recall
value: [0.7037037 0.81481481 0.7037037 0.77777778 0.77777778 0.71428571
0.78571429 0.82142857 0.75 0.75 ]
mean value: 0.7599206349206349
key: train_recall
value: [0.83870968 0.75403226 0.82258065 0.76209677 0.75806452 0.76518219
0.7611336 0.75708502 0.7611336 0.75708502]
mean value: 0.7737103304166123
key: test_roc_auc
value: [0.74470899 0.81812169 0.78042328 0.81746032 0.88888889 0.78306878
0.87433862 0.89219577 0.76388889 0.83796296]
mean value: 0.8201058201058201
key: train_roc_auc
value: [0.84850464 0.83248172 0.87080449 0.8304411 0.83044926 0.84025238
0.8321797 0.82612316 0.83419583 0.83015541]
mean value: 0.837558769753167
key: test_jcc
value: [0.57575758 0.6875 0.61290323 0.67741935 0.77777778 0.625
0.75862069 0.79310345 0.61764706 0.7 ]
mean value: 0.6825729130935079
key: train_jcc
value: [0.73498233 0.69259259 0.76119403 0.69230769 0.69117647 0.70522388
0.69372694 0.68498168 0.6962963 0.6900369 ]
mean value: 0.7042518817008117
MCC on Blind test: 0.31
Accuracy on Blind test: 0.73
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01024437 0.0100975 0.01008296 0.01021576 0.01011634 0.01013732
0.01008844 0.01011109 0.01027632 0.01016641]
mean value: 0.010153651237487793
key: score_time
value: [0.00875211 0.00881267 0.00876999 0.00872159 0.0086987 0.00877166
0.00879073 0.00876713 0.0087781 0.00872397]
mean value: 0.00875866413116455
key: test_mcc
value: [0.56841568 0.65330526 0.60000053 0.75033796 0.89139151 0.63841116
0.78174603 0.75878131 0.52935027 0.6005291 ]
mean value: 0.6772268811002418
key: train_mcc
value: [0.7134319 0.76975822 0.75369821 0.71362312 0.74951431 0.76567678
0.77794469 0.71736756 0.70631188 0.72166787]
mean value: 0.738899455109274
key: test_accuracy
value: [0.78181818 0.81818182 0.8 0.87272727 0.94545455 0.81818182
0.89090909 0.87272727 0.76363636 0.8 ]
mean value: 0.8363636363636364
key: train_accuracy
value: [0.85656566 0.88484848 0.87676768 0.85656566 0.87474747 0.88282828
0.88888889 0.85858586 0.85252525 0.86060606]
mean value: 0.8692929292929292
key: test_fscore
value: [0.76 0.83333333 0.79245283 0.87719298 0.94339623 0.81481481
0.89285714 0.8627451 0.77966102 0.8 ]
mean value: 0.8356453445053573
key: train_fscore
value: [0.85480573 0.88438134 0.87576375 0.85420945 0.87550201 0.88211382
0.88977956 0.85655738 0.84759916 0.85773196]
mean value: 0.8678444146780728
key: test_precision
value: [0.82608696 0.75757576 0.80769231 0.83333333 0.96153846 0.84615385
0.89285714 0.95652174 0.74193548 0.81481481]
mean value: 0.8438509843488806
key: train_precision
value: [0.86721992 0.88979592 0.88477366 0.87029289 0.872 0.88571429
0.88095238 0.86721992 0.875 0.87394958]
mean value: 0.8766918548471572
key: test_recall
value: [0.7037037 0.92592593 0.77777778 0.92592593 0.92592593 0.78571429
0.89285714 0.78571429 0.82142857 0.78571429]
mean value: 0.8330687830687831
key: train_recall
value: [0.84274194 0.87903226 0.86693548 0.83870968 0.87903226 0.87854251
0.89878543 0.84615385 0.82186235 0.84210526]
mean value: 0.8593901005615776
key: test_roc_auc
value: [0.78042328 0.82010582 0.79960317 0.87367725 0.94510582 0.81878307
0.89087302 0.87433862 0.76256614 0.80026455]
mean value: 0.8365740740740741
key: train_roc_auc
value: [0.85659364 0.88486026 0.87678758 0.8566018 0.8747388 0.88281964
0.88890884 0.85856079 0.85246343 0.86056876]
mean value: 0.869290355230508
key: test_jcc
value: [0.61290323 0.71428571 0.65625 0.78125 0.89285714 0.6875
0.80645161 0.75862069 0.63888889 0.66666667]
mean value: 0.7215673941063262
key: train_jcc
value: [0.74642857 0.79272727 0.77898551 0.74551971 0.77857143 0.78909091
0.80144404 0.74910394 0.73550725 0.75090253]
mean value: 0.766828116175246
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00968528 0.01099563 0.01088142 0.01142859 0.01120138 0.01143241
0.01128864 0.01136971 0.01154232 0.01143885]
mean value: 0.011126422882080078
key: score_time
value: [0.01223397 0.01313019 0.01311469 0.01409554 0.01288962 0.01380253
0.01319766 0.01341391 0.01384592 0.01357269]
mean value: 0.013329672813415527
key: test_mcc
value: [0.50088476 0.45502646 0.30952381 0.52715278 0.56441351 0.28210909
0.68504815 0.68504815 0.42602426 0.53758181]
mean value: 0.4972812768008551
key: train_mcc
value: [0.69697662 0.69314677 0.68904093 0.64067939 0.6782269 0.7057203
0.69371399 0.66901351 0.75103262 0.68538123]
mean value: 0.6902932259968104
key: test_accuracy
value: [0.74545455 0.72727273 0.65454545 0.76363636 0.78181818 0.63636364
0.83636364 0.83636364 0.70909091 0.76363636]
mean value: 0.7454545454545455
key: train_accuracy
value: [0.84848485 0.84646465 0.84444444 0.82020202 0.83838384 0.85252525
0.84646465 0.83434343 0.87474747 0.84242424]
mean value: 0.8448484848484848
key: test_fscore
value: [0.70833333 0.72727273 0.65454545 0.75471698 0.76923077 0.6
0.82352941 0.82352941 0.74193548 0.74509804]
mean value: 0.7348191612130426
key: train_fscore
value: [0.84848485 0.84489796 0.84317719 0.81799591 0.83333333 0.84886128
0.84232365 0.83127572 0.87029289 0.83884298]
mean value: 0.8419485757928358
key: test_precision
value: [0.80952381 0.71428571 0.64285714 0.76923077 0.8 0.68181818
0.91304348 0.91304348 0.67647059 0.82608696]
mean value: 0.774636011899439
key: train_precision
value: [0.85020243 0.8553719 0.85185185 0.82987552 0.86206897 0.86864407
0.86382979 0.84518828 0.9004329 0.85654008]
mean value: 0.8584005790388104
key: test_recall
value: [0.62962963 0.74074074 0.66666667 0.74074074 0.74074074 0.53571429
0.75 0.75 0.82142857 0.67857143]
mean value: 0.7054232804232804
key: train_recall
value: [0.84677419 0.83467742 0.83467742 0.80645161 0.80645161 0.82995951
0.82186235 0.81781377 0.84210526 0.82186235]
mean value: 0.8262635496930912
key: test_roc_auc
value: [0.74338624 0.72751323 0.6547619 0.76322751 0.78108466 0.63822751
0.83796296 0.83796296 0.70701058 0.76521164]
mean value: 0.7456349206349207
key: train_roc_auc
value: [0.84848831 0.84648851 0.84446422 0.82022986 0.83844848 0.85247976
0.84641505 0.83431011 0.87468166 0.84238279]
mean value: 0.8448388729267338
key: test_jcc
value: [0.5483871 0.57142857 0.48648649 0.60606061 0.625 0.42857143
0.7 0.7 0.58974359 0.59375 ]
mean value: 0.5849427779064875
key: train_jcc
value: [0.73684211 0.73144876 0.72887324 0.69204152 0.71428571 0.73741007
0.72759857 0.71126761 0.77037037 0.72241993]
mean value: 0.7272557887808211
MCC on Blind test: 0.26
Accuracy on Blind test: 0.68
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02347016 0.02215576 0.0214541 0.02343726 0.021842 0.02241516
0.02220798 0.02189755 0.0212822 0.02229095]
mean value: 0.022245311737060548
key: score_time
value: [0.01336884 0.01203799 0.01242447 0.01285195 0.01233602 0.01203322
0.01214242 0.0121491 0.0117538 0.01247334]
mean value: 0.012357115745544434
key: test_mcc
value: [0.60876172 0.78410665 0.60000053 0.85695439 0.92962225 0.81854376
0.81878307 0.92724868 0.82269299 0.81878307]
mean value: 0.7985497110226856
key: train_mcc
value: [0.81823674 0.8101084 0.8222215 0.79800833 0.78586238 0.79800173
0.79797897 0.79394672 0.80207347 0.79797897]
mean value: 0.8024417204598691
key: test_accuracy
value: [0.8 0.89090909 0.8 0.92727273 0.96363636 0.90909091
0.90909091 0.96363636 0.90909091 0.90909091]
mean value: 0.8981818181818182
key: train_accuracy
value: [0.90909091 0.90505051 0.91111111 0.8989899 0.89292929 0.8989899
0.8989899 0.8969697 0.9010101 0.8989899 ]
mean value: 0.9012121212121212
key: test_fscore
value: [0.7755102 0.89285714 0.79245283 0.92857143 0.96153846 0.9122807
0.90909091 0.96428571 0.91525424 0.90909091]
mean value: 0.8960932538747399
key: train_fscore
value: [0.90981964 0.90505051 0.91129032 0.89878543 0.89336016 0.89837398
0.89878543 0.8969697 0.90020367 0.89878543]
mean value: 0.9011424249876461
key: test_precision
value: [0.86363636 0.86206897 0.80769231 0.89655172 1. 0.89655172
0.92592593 0.96428571 0.87096774 0.92592593]
mean value: 0.9013606393194825
key: train_precision
value: [0.90438247 0.90688259 0.91129032 0.90243902 0.89156627 0.90204082
0.89878543 0.89516129 0.9057377 0.89878543]
mean value: 0.9017071335013342
key: test_recall
value: [0.7037037 0.92592593 0.77777778 0.96296296 0.92592593 0.92857143
0.89285714 0.96428571 0.96428571 0.89285714]
mean value: 0.8939153439153439
key: train_recall
value: [0.91532258 0.90322581 0.91129032 0.89516129 0.89516129 0.89473684
0.89878543 0.89878543 0.89473684 0.89878543]
mean value: 0.9005991249836751
key: test_roc_auc
value: [0.79828042 0.89153439 0.79960317 0.92791005 0.96296296 0.90873016
0.90939153 0.96362434 0.90806878 0.90939153]
mean value: 0.8979497354497354
key: train_roc_auc
value: [0.90907829 0.9050542 0.91111075 0.89899765 0.89292477 0.89898132
0.89898949 0.89697336 0.90099745 0.89898949]
mean value: 0.9012096774193549
key: test_jcc
value: [0.63333333 0.80645161 0.65625 0.86666667 0.92592593 0.83870968
0.83333333 0.93103448 0.84375 0.83333333]
mean value: 0.8168788365673794
key: train_jcc
value: [0.83455882 0.82656827 0.83703704 0.81617647 0.80727273 0.81549815
0.81617647 0.81318681 0.81851852 0.81617647]
mean value: 0.8201169751973421
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.93131471 1.97601247 1.99174571 2.14477992 2.12438631 1.95171905
1.95633554 2.3193562 1.9155972 2.00993013]
mean value: 2.03211772441864
key: score_time
value: [0.01248431 0.01242208 0.01508594 0.01516485 0.01528311 0.01378775
0.01732278 0.01664519 0.01326823 0.01835799]
mean value: 0.014982223510742188
key: test_mcc
value: [0.74935731 0.75878131 0.56841568 0.81878307 0.92962225 0.78174603
0.82337971 0.92724868 0.86334835 0.81878307]
mean value: 0.8039465463217836
key: train_mcc
value: [0.99195142 0.9878869 0.99596768 0.99195168 1. 0.99195142
0.99596768 0.9878869 0.99596768 1. ]
mean value: 0.9939531339647318
key: test_accuracy
value: [0.87272727 0.87272727 0.78181818 0.90909091 0.96363636 0.89090909
0.90909091 0.96363636 0.92727273 0.90909091]
mean value: 0.9
key: train_accuracy
value: [0.9959596 0.99393939 0.9979798 0.9959596 1. 0.9959596
0.9979798 0.99393939 0.9979798 1. ]
mean value: 0.996969696969697
key: test_fscore
value: [0.8627451 0.88135593 0.76 0.90909091 0.96153846 0.89285714
0.90566038 0.96428571 0.93333333 0.90909091]
mean value: 0.8979957877797566
key: train_fscore
value: [0.99598394 0.99393939 0.99798793 0.99595142 1. 0.99593496
0.9979716 0.99393939 0.9979716 1. ]
mean value: 0.9969680232408948
key: test_precision
value: [0.91666667 0.8125 0.82608696 0.89285714 1. 0.89285714
0.96 0.96428571 0.875 0.92592593]
mean value: 0.9066179549114332
key: train_precision
value: [0.992 0.99595142 0.99598394 1. 1. 1.
1. 0.99193548 1. 1. ]
mean value: 0.9975870836617988
key: test_recall
value: [0.81481481 0.96296296 0.7037037 0.92592593 0.92592593 0.89285714
0.85714286 0.96428571 1. 0.89285714]
mean value: 0.8940476190476191
key: train_recall
value: [1. 0.99193548 1. 0.99193548 1. 0.99190283
0.99595142 0.99595142 0.99595142 1. ]
mean value: 0.9963628052762178
key: test_roc_auc
value: [0.87169312 0.87433862 0.78042328 0.90939153 0.96296296 0.89087302
0.91005291 0.96362434 0.92592593 0.90939153]
mean value: 0.8998677248677249
key: train_roc_auc
value: [0.99595142 0.99394345 0.99797571 0.99596774 1. 0.99595142
0.99797571 0.99394345 0.99797571 1. ]
mean value: 0.996968460232467
key: test_jcc
value: [0.75862069 0.78787879 0.61290323 0.83333333 0.92592593 0.80645161
0.82758621 0.93103448 0.875 0.83333333]
mean value: 0.8192067598491403
key: train_jcc
value: [0.992 0.98795181 0.99598394 0.99193548 1. 0.99190283
0.99595142 0.98795181 0.99595142 1. ]
mean value: 0.9939628702087965
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02939534 0.02539611 0.02019739 0.02184558 0.02218199 0.02318048
0.02253628 0.02189136 0.02258277 0.02346587]
mean value: 0.023267316818237304
key: score_time
value: [0.01231742 0.0091126 0.00892138 0.00938058 0.00929308 0.00980258
0.0096314 0.00899553 0.00903773 0.00899863]
mean value: 0.009549093246459962
key: test_mcc
value: [0.8565805 0.78353876 0.78174603 0.89139151 0.96423926 0.78353876
0.92724868 0.96428571 0.89139151 0.89139151]
mean value: 0.8735352224131899
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92727273 0.89090909 0.89090909 0.94545455 0.98181818 0.89090909
0.96363636 0.98181818 0.94545455 0.94545455]
mean value: 0.9363636363636363
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.88461538 0.88888889 0.94339623 0.98113208 0.89655172
0.96428571 0.98181818 0.94736842 0.94736842]
mean value: 0.935850196081508
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96 0.92 0.88888889 0.96153846 1. 0.86666667
0.96428571 1. 0.93103448 0.93103448]
mean value: 0.9423448696896972
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.85185185 0.88888889 0.92592593 0.96296296 0.92857143
0.96428571 0.96428571 0.96428571 0.96428571]
mean value: 0.9304232804232804
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9265873 0.89021164 0.89087302 0.94510582 0.98148148 0.89021164
0.96362434 0.98214286 0.94510582 0.94510582]
mean value: 0.9360449735449736
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.79310345 0.8 0.89285714 0.96296296 0.8125
0.93103448 0.96428571 0.9 0.9 ]
mean value: 0.881388660828316
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.54
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12487578 0.12705588 0.13158202 0.12779379 0.12834001 0.12796426
0.12580824 0.12442112 0.12178755 0.12430549]
mean value: 0.12639341354370118
key: score_time
value: [0.01797676 0.01992846 0.01879454 0.01852512 0.01992798 0.01827836
0.01831627 0.01806569 0.01809311 0.0183239 ]
mean value: 0.01862301826477051
key: test_mcc
value: [0.82269299 0.79069197 0.67284827 0.85695439 0.92962225 0.78353876
0.78410665 0.89139151 0.75724019 0.78410665]
mean value: 0.8073193627447607
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90909091 0.89090909 0.83636364 0.92727273 0.96363636 0.89090909
0.89090909 0.94545455 0.87272727 0.89090909]
mean value: 0.9018181818181817
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90196078 0.89655172 0.83018868 0.92857143 0.96153846 0.89655172
0.88888889 0.94736842 0.8852459 0.88888889]
mean value: 0.9025754902414515
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95833333 0.83870968 0.84615385 0.89655172 1. 0.86666667
0.92307692 0.93103448 0.81818182 0.92307692]
mean value: 0.9001785394805417
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.85185185 0.96296296 0.81481481 0.96296296 0.92592593 0.92857143
0.85714286 0.96428571 0.96428571 0.85714286]
mean value: 0.908994708994709
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90806878 0.89219577 0.83597884 0.92791005 0.96296296 0.89021164
0.89153439 0.94510582 0.87103175 0.89153439]
mean value: 0.9016534391534392
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82142857 0.8125 0.70967742 0.86666667 0.92592593 0.8125
0.8 0.9 0.79411765 0.8 ]
mean value: 0.8242816230434826
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.31
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01087308 0.01074958 0.010571 0.01048541 0.0106163 0.01041913
0.01044679 0.01051927 0.01077652 0.01043153]
mean value: 0.010588860511779786
key: score_time
value: [0.00972962 0.0091846 0.00901866 0.00877166 0.00883484 0.00898385
0.00970221 0.00887561 0.00894332 0.00887108]
mean value: 0.009091544151306152
key: test_mcc
value: [0.27734221 0.62202265 0.45601459 0.53121272 0.56841568 0.67328042
0.78410665 0.38267891 0.42602426 0.56841568]
mean value: 0.5289513774861617
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63636364 0.8 0.72727273 0.76363636 0.78181818 0.83636364
0.89090909 0.69090909 0.70909091 0.78181818]
mean value: 0.7618181818181818
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.65517241 0.81967213 0.70588235 0.77192982 0.76 0.83636364
0.88888889 0.71186441 0.74193548 0.8 ]
mean value: 0.7691709138346379
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.61290323 0.73529412 0.75 0.73333333 0.82608696 0.85185185
0.92307692 0.67741935 0.67647059 0.75 ]
mean value: 0.7536436351311362
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7037037 0.92592593 0.66666667 0.81481481 0.7037037 0.82142857
0.85714286 0.75 0.82142857 0.85714286]
mean value: 0.7921957671957671
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63756614 0.80224868 0.72619048 0.76455026 0.78042328 0.83664021
0.89153439 0.68981481 0.70701058 0.78042328]
mean value: 0.7616402116402117
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.48717949 0.69444444 0.54545455 0.62857143 0.61290323 0.71875
0.8 0.55263158 0.58974359 0.66666667]
mean value: 0.6296344966813983
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.65
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.8867898 1.81904387 1.8012898 1.79828095 1.82392001 1.80473995
1.81928992 1.82994795 1.81334281 1.79255438]
mean value: 1.8189199447631836
key: score_time
value: [0.102422 0.09241891 0.094944 0.09771085 0.09159255 0.09206796
0.14494443 0.09352231 0.09137368 0.09122992]
mean value: 0.09922266006469727
key: test_mcc
value: [0.8565805 0.92980214 0.82269299 0.89139151 1. 0.89139151
0.96423926 1. 0.96423926 0.89139151]
mean value: 0.9211728680565975
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92727273 0.96363636 0.90909091 0.94545455 1. 0.94545455
0.98181818 1. 0.98181818 0.94545455]
mean value: 0.96
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.96428571 0.90196078 0.94339623 1. 0.94736842
0.98245614 1. 0.98245614 0.94736842]
mean value: 0.9592368770898475
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96 0.93103448 0.95833333 0.96153846 1. 0.93103448
0.96551724 1. 0.96551724 0.93103448]
mean value: 0.9604009725906277
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.85185185 0.92592593 1. 0.96428571
1. 1. 1. 0.96428571]
mean value: 0.9595238095238096
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9265873 0.96428571 0.90806878 0.94510582 1. 0.94510582
0.98148148 1. 0.98148148 0.94510582]
mean value: 0.9597222222222223
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.93103448 0.82142857 0.89285714 1. 0.9
0.96551724 1. 0.96551724 0.9 ]
mean value: 0.9233497536945813
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.93221402 0.93724418 0.96414828 1.00427461 0.99941468 1.05338717
0.99944305 1.02059126 0.98394442 0.99001122]
mean value: 0.9884672880172729
key: score_time
value: [0.27788615 0.26738977 0.26741028 0.21771598 0.24798799 0.282516
0.21504283 0.24273372 0.21704197 0.2816062 ]
mean value: 0.2517330884933472
key: test_mcc
value: [0.81854376 0.92980214 0.82269299 0.89139151 0.96423926 0.89153439
0.92724868 1. 0.92962225 0.89139151]
mean value: 0.9066466495296084
key: train_mcc
value: [0.95556281 0.95151495 0.95971983 0.95556354 0.94748184 0.95556354
0.95154681 0.95151495 0.95151495 0.95962779]
mean value: 0.9539611013324247
key: test_accuracy
value: [0.90909091 0.96363636 0.90909091 0.94545455 0.98181818 0.94545455
0.96363636 1. 0.96363636 0.94545455]
mean value: 0.9527272727272728
key: train_accuracy
value: [0.97777778 0.97575758 0.97979798 0.97777778 0.97373737 0.97777778
0.97575758 0.97575758 0.97575758 0.97979798]
mean value: 0.9769696969696969
key: test_fscore
value: [0.90566038 0.96428571 0.90196078 0.94339623 0.98113208 0.94545455
0.96428571 1. 0.96551724 0.94736842]
mean value: 0.9519061100016925
key: train_fscore
value: [0.9778672 0.97580645 0.98 0.97777778 0.97384306 0.97777778
0.97580645 0.9757085 0.9757085 0.97983871]
mean value: 0.9770134434076782
key: test_precision
value: [0.92307692 0.93103448 0.95833333 0.96153846 1. 0.96296296
0.96428571 1. 0.93333333 0.93103448]
mean value: 0.956559969404797
key: train_precision
value: [0.97590361 0.97580645 0.97222222 0.97975709 0.97188755 0.97580645
0.97188755 0.9757085 0.9757085 0.97590361]
mean value: 0.9750591543834124
key: test_recall
value: [0.88888889 1. 0.85185185 0.92592593 0.96296296 0.92857143
0.96428571 1. 1. 0.96428571]
mean value: 0.9486772486772487
key: train_recall
value: [0.97983871 0.97580645 0.98790323 0.97580645 0.97580645 0.97975709
0.97975709 0.9757085 0.9757085 0.98380567]
mean value: 0.9789898132427843
key: test_roc_auc
value: [0.90873016 0.96428571 0.90806878 0.94510582 0.98148148 0.9457672
0.96362434 1. 0.96296296 0.94510582]
mean value: 0.9525132275132275
key: train_roc_auc
value: [0.97777361 0.97575748 0.97978157 0.97778177 0.97373319 0.97778177
0.97576564 0.97575748 0.97575748 0.97980606]
mean value: 0.9769696029776674
key: test_jcc
value: [0.82758621 0.93103448 0.82142857 0.89285714 0.96296296 0.89655172
0.93103448 1. 0.93333333 0.9 ]
mean value: 0.9096788907133735
key: train_jcc
value: [0.95669291 0.95275591 0.96078431 0.95652174 0.94901961 0.95652174
0.95275591 0.95256917 0.95256917 0.96047431]
mean value: 0.955066477246029
MCC on Blind test: 0.25
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02516007 0.01044226 0.01031733 0.01038718 0.01034856 0.0114646
0.01037955 0.01027536 0.01097727 0.01110005]
mean value: 0.012085223197937011
key: score_time
value: [0.01297903 0.00912786 0.00893593 0.00900245 0.00891113 0.00889015
0.00875902 0.00933337 0.00927258 0.00886822]
mean value: 0.009407973289489746
key: test_mcc
value: [0.56841568 0.65330526 0.60000053 0.75033796 0.89139151 0.63841116
0.78174603 0.75878131 0.52935027 0.6005291 ]
mean value: 0.6772268811002418
key: train_mcc
value: [0.7134319 0.76975822 0.75369821 0.71362312 0.74951431 0.76567678
0.77794469 0.71736756 0.70631188 0.72166787]
mean value: 0.738899455109274
key: test_accuracy
value: [0.78181818 0.81818182 0.8 0.87272727 0.94545455 0.81818182
0.89090909 0.87272727 0.76363636 0.8 ]
mean value: 0.8363636363636364
key: train_accuracy
value: [0.85656566 0.88484848 0.87676768 0.85656566 0.87474747 0.88282828
0.88888889 0.85858586 0.85252525 0.86060606]
mean value: 0.8692929292929292
key: test_fscore
value: [0.76 0.83333333 0.79245283 0.87719298 0.94339623 0.81481481
0.89285714 0.8627451 0.77966102 0.8 ]
mean value: 0.8356453445053573
key: train_fscore
value: [0.85480573 0.88438134 0.87576375 0.85420945 0.87550201 0.88211382
0.88977956 0.85655738 0.84759916 0.85773196]
mean value: 0.8678444146780728
key: test_precision
value: [0.82608696 0.75757576 0.80769231 0.83333333 0.96153846 0.84615385
0.89285714 0.95652174 0.74193548 0.81481481]
mean value: 0.8438509843488806
key: train_precision
value: [0.86721992 0.88979592 0.88477366 0.87029289 0.872 0.88571429
0.88095238 0.86721992 0.875 0.87394958]
mean value: 0.8766918548471572
key: test_recall
value: [0.7037037 0.92592593 0.77777778 0.92592593 0.92592593 0.78571429
0.89285714 0.78571429 0.82142857 0.78571429]
mean value: 0.8330687830687831
key: train_recall
value: [0.84274194 0.87903226 0.86693548 0.83870968 0.87903226 0.87854251
0.89878543 0.84615385 0.82186235 0.84210526]
mean value: 0.8593901005615776
key: test_roc_auc
value: [0.78042328 0.82010582 0.79960317 0.87367725 0.94510582 0.81878307
0.89087302 0.87433862 0.76256614 0.80026455]
mean value: 0.8365740740740741
key: train_roc_auc
value: [0.85659364 0.88486026 0.87678758 0.8566018 0.8747388 0.88281964
0.88890884 0.85856079 0.85246343 0.86056876]
mean value: 0.869290355230508
key: test_jcc
value: [0.61290323 0.71428571 0.65625 0.78125 0.89285714 0.6875
0.80645161 0.75862069 0.63888889 0.66666667]
mean value: 0.7215673941063262
key: train_jcc
value: [0.74642857 0.79272727 0.77898551 0.74551971 0.77857143 0.78909091
0.80144404 0.74910394 0.73550725 0.75090253]
mean value: 0.766828116175246
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.13260365 0.07910442 0.07843709 0.08254647 0.08715963 0.08041072
0.08045745 0.08163977 0.08075643 0.08242631]
mean value: 0.0865541934967041
key: score_time
value: [0.01190138 0.01126909 0.01197362 0.01131248 0.01109767 0.01134634
0.01114702 0.01117444 0.01196313 0.01197171]
mean value: 0.0115156888961792
key: test_mcc
value: [0.92724868 1. 0.85449735 0.89139151 1. 0.8565805
0.96423926 1. 0.96423926 0.89139151]
mean value: 0.9349588068599265
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96363636 1. 0.92727273 0.94545455 1. 0.92727273
0.98181818 1. 0.98181818 0.94545455]
mean value: 0.9672727272727273
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 1. 0.92592593 0.94339623 1. 0.93103448
0.98245614 1. 0.98245614 0.94736842]
mean value: 0.967560029981699
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 1. 0.92592593 0.96153846 1. 0.9
0.96551724 1. 0.96551724 0.93103448]
mean value: 0.9612496315944592
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.92592593 0.92592593 1. 0.96428571
1. 1. 1. 0.96428571]
mean value: 0.9743386243386243
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96362434 1. 0.92724868 0.94510582 1. 0.9265873
0.98148148 1. 0.98148148 0.94510582]
mean value: 0.9670634920634921
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 1. 0.86206897 0.89285714 1. 0.87096774
0.96551724 1. 0.96551724 0.9 ]
mean value: 0.9385499761639917
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.35
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04187107 0.0752697 0.04425359 0.06683969 0.07338858 0.07722521
0.04798603 0.05801558 0.04432964 0.10096288]
mean value: 0.06301419734954834
key: score_time
value: [0.01879668 0.01285887 0.02497196 0.01250243 0.02248597 0.01239944
0.02228522 0.01246166 0.02562261 0.01915598]
mean value: 0.018354082107543947
key: test_mcc
value: [0.64214885 0.7112589 0.57574525 0.82337971 0.92962225 0.89139151
0.74569602 0.89153439 0.80032673 0.74569602]
mean value: 0.7756799630794246
key: train_mcc
value: [0.9071347 0.89097143 0.91551261 0.90707697 0.88686823 0.88698776
0.89498 0.89498 0.89091681 0.89091681]
mean value: 0.8966345315165652
key: test_accuracy
value: [0.81818182 0.85454545 0.78181818 0.90909091 0.96363636 0.94545455
0.87272727 0.94545455 0.89090909 0.87272727]
mean value: 0.8854545454545455
key: train_accuracy
value: [0.95353535 0.94545455 0.95757576 0.95353535 0.94343434 0.94343434
0.94747475 0.94747475 0.94545455 0.94545455]
mean value: 0.9482828282828283
key: test_fscore
value: [0.8 0.85714286 0.75 0.9122807 0.96153846 0.94736842
0.87719298 0.94545455 0.90322581 0.87719298]
mean value: 0.8831396758306775
key: train_fscore
value: [0.95390782 0.94589178 0.9582505 0.95372233 0.94354839 0.9437751
0.94758065 0.94758065 0.94545455 0.94545455]
mean value: 0.9485166298950366
key: test_precision
value: [0.86956522 0.82758621 0.85714286 0.86666667 1. 0.93103448
0.86206897 0.96296296 0.82352941 0.86206897]
mean value: 0.8862625736618152
key: train_precision
value: [0.94820717 0.94023904 0.94509804 0.95180723 0.94354839 0.93625498
0.9437751 0.9437751 0.94354839 0.94354839]
mean value: 0.9439801825444007
key: test_recall
value: [0.74074074 0.88888889 0.66666667 0.96296296 0.92592593 0.96428571
0.89285714 0.92857143 1. 0.89285714]
mean value: 0.8863756613756614
key: train_recall
value: [0.95967742 0.9516129 0.97177419 0.95564516 0.94354839 0.951417
0.951417 0.951417 0.94736842 0.94736842]
mean value: 0.9531245918767142
key: test_roc_auc
value: [0.81679894 0.85515873 0.7797619 0.91005291 0.96296296 0.94510582
0.8723545 0.9457672 0.88888889 0.8723545 ]
mean value: 0.8849206349206349
key: train_roc_auc
value: [0.95352292 0.94544208 0.95754702 0.95353108 0.94343411 0.94345044
0.9474827 0.9474827 0.9454584 0.9454584 ]
mean value: 0.9482809847198642
key: test_jcc
value: [0.66666667 0.75 0.6 0.83870968 0.92592593 0.9
0.78125 0.89655172 0.82352941 0.78125 ]
mean value: 0.7963883405914585
key: train_jcc
value: [0.91187739 0.8973384 0.91984733 0.91153846 0.89312977 0.89353612
0.90038314 0.90038314 0.89655172 0.89655172]
mean value: 0.9021137211926713
MCC on Blind test: 0.2
Accuracy on Blind test: 0.66
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02523851 0.01029706 0.01058316 0.01143408 0.01133108 0.01030755
0.01012635 0.01112747 0.00999737 0.0097971 ]
mean value: 0.01202397346496582
key: score_time
value: [0.01250792 0.00929117 0.00969481 0.00978851 0.00898647 0.00870633
0.00878239 0.00980616 0.00981569 0.0087533 ]
mean value: 0.009613275527954102
key: test_mcc
value: [0.60876172 0.79069197 0.52715278 0.81878307 0.92962225 0.78353876
0.89153439 0.67729621 0.68300095 0.67729621]
mean value: 0.7387678309675754
key: train_mcc
value: [0.73356087 0.76975822 0.7738449 0.72535511 0.75760346 0.76166531
0.76567678 0.71326401 0.73390987 0.72956825]
mean value: 0.74642067826384
key: test_accuracy
value: [0.8 0.89090909 0.76363636 0.90909091 0.96363636 0.89090909
0.94545455 0.83636364 0.83636364 0.83636364]
mean value: 0.8672727272727272
key: train_accuracy
value: [0.86666667 0.88484848 0.88686869 0.86262626 0.87878788 0.88080808
0.88282828 0.85656566 0.86666667 0.86464646]
mean value: 0.8731313131313132
key: test_fscore
value: [0.7755102 0.89655172 0.75471698 0.90909091 0.96153846 0.89655172
0.94545455 0.83018868 0.85245902 0.83018868]
mean value: 0.8652250924457494
key: train_fscore
value: [0.86530612 0.88438134 0.88617886 0.86178862 0.87854251 0.87983707
0.88211382 0.85480573 0.86363636 0.862423 ]
mean value: 0.8719013426889961
key: test_precision
value: [0.86363636 0.83870968 0.76923077 0.89285714 1. 0.86666667
0.96296296 0.88 0.78787879 0.88 ]
mean value: 0.8741942370652048
key: train_precision
value: [0.87603306 0.88979592 0.89344262 0.86885246 0.88211382 0.8852459
0.88571429 0.86363636 0.88185654 0.875 ]
mean value: 0.8801690970398393
key: test_recall
value: [0.7037037 0.96296296 0.74074074 0.92592593 0.92592593 0.92857143
0.92857143 0.78571429 0.92857143 0.78571429]
mean value: 0.8616402116402117
key: train_recall
value: [0.85483871 0.87903226 0.87903226 0.85483871 0.875 0.87449393
0.87854251 0.84615385 0.84615385 0.85020243]
mean value: 0.8638288494188324
key: test_roc_auc
value: [0.79828042 0.89219577 0.76322751 0.90939153 0.96296296 0.89021164
0.9457672 0.83730159 0.83465608 0.83730159]
mean value: 0.8671296296296296
key: train_roc_auc
value: [0.86669061 0.88486026 0.88688455 0.86264203 0.87879555 0.88079535
0.88281964 0.85654467 0.86662531 0.86461734]
mean value: 0.8731275303643725
key: test_jcc
value: [0.63333333 0.8125 0.60606061 0.83333333 0.92592593 0.8125
0.89655172 0.70967742 0.74285714 0.70967742]
mean value: 0.768241690435795
key: train_jcc
value: [0.76258993 0.79272727 0.79562044 0.75714286 0.7833935 0.78545455
0.78909091 0.74642857 0.76 0.75812274]
mean value: 0.7730570767345278
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01875067 0.01982927 0.02383804 0.01903248 0.01827717 0.02330303
0.02174616 0.01853609 0.02163601 0.02033186]
mean value: 0.020528078079223633
key: score_time
value: [0.01003098 0.01140165 0.01187396 0.01201582 0.01195359 0.01196742
0.01209927 0.01293659 0.01179004 0.01193452]
mean value: 0.011800384521484375
key: test_mcc
value: [0.75724019 0.70899471 0.57068493 0.78353876 1. 0.76980036
0.8565805 0.3105295 0.76980036 0.85695439]
mean value: 0.7384123695263163
key: train_mcc
value: [0.87485317 0.85422763 0.8581304 0.83110746 0.80103924 0.78322194
0.86171594 0.40327111 0.87905164 0.87531676]
mean value: 0.8021935299759908
key: test_accuracy
value: [0.87272727 0.85454545 0.78181818 0.89090909 1. 0.87272727
0.92727273 0.58181818 0.87272727 0.92727273]
mean value: 0.8581818181818182
key: train_accuracy
value: [0.93535354 0.92525253 0.92525253 0.91111111 0.8969697 0.88282828
0.92929293 0.64040404 0.93939394 0.93737374]
mean value: 0.8923232323232323
key: test_fscore
value: [0.85714286 0.85185185 0.79310345 0.88461538 1. 0.88888889
0.93103448 0.3030303 0.88888889 0.92592593]
mean value: 0.8324482031378583
key: train_fscore
value: [0.93220339 0.9217759 0.93005671 0.90434783 0.90359168 0.89377289
0.93203883 0.43670886 0.94 0.93608247]
mean value: 0.8730578571342905
key: test_precision
value: [0.95454545 0.85185185 0.74193548 0.92 1. 0.8
0.9 1. 0.8 0.96153846]
mean value: 0.8929871251806736
key: train_precision
value: [0.98214286 0.96888889 0.87544484 0.98113208 0.85053381 0.81605351
0.89552239 1. 0.92885375 0.95378151]
mean value: 0.9252353636501418
key: test_recall
value: [0.77777778 0.85185185 0.85185185 0.85185185 1. 1.
0.96428571 0.17857143 1. 0.89285714]
mean value: 0.8369047619047619
key: train_recall
value: [0.88709677 0.87903226 0.99193548 0.83870968 0.96370968 0.98785425
0.97165992 0.27935223 0.951417 0.91902834]
mean value: 0.8669795611858431
key: test_roc_auc
value: [0.87103175 0.85449735 0.78306878 0.89021164 1. 0.87037037
0.9265873 0.58928571 0.87037037 0.92791005]
mean value: 0.8583333333333333
key: train_roc_auc
value: [0.93545122 0.92534609 0.92511754 0.91125767 0.8968346 0.88304003
0.92937835 0.63967611 0.93941818 0.93733675]
mean value: 0.8922856536502547
key: test_jcc
value: [0.75 0.74193548 0.65714286 0.79310345 1. 0.8
0.87096774 0.17857143 0.8 0.86206897]
mean value: 0.7453789925313841
key: train_jcc
value: [0.87301587 0.85490196 0.86925795 0.82539683 0.82413793 0.80794702
0.87272727 0.27935223 0.88679245 0.87984496]
mean value: 0.7973374474147499
MCC on Blind test: 0.21
Accuracy on Blind test: 0.49
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01975441 0.01947474 0.01754856 0.01995182 0.02250051 0.01994252
0.02293873 0.02241492 0.02045321 0.02638674]
mean value: 0.02113661766052246
key: score_time
value: [0.01200104 0.01194429 0.01183605 0.01182842 0.01182103 0.01209521
0.01194048 0.01198792 0.01181817 0.01200318]
mean value: 0.011927580833435059
key: test_mcc
value: [0.74569602 0.68491749 0.60268595 0.81878307 0.92962225 0.75724019
0.72754449 0.89153439 0.86334835 0.85695439]
mean value: 0.7878326593584918
key: train_mcc
value: [0.87731732 0.79931631 0.8834206 0.87890613 0.88545816 0.83650879
0.86174381 0.9071347 0.88686823 0.90920534]
mean value: 0.8725879390008385
key: test_accuracy
value: [0.87272727 0.81818182 0.8 0.90909091 0.96363636 0.87272727
0.85454545 0.94545455 0.92727273 0.92727273]
mean value: 0.889090909090909
key: train_accuracy
value: [0.93737374 0.89494949 0.94141414 0.93939394 0.94141414 0.91515152
0.92727273 0.95353535 0.94343434 0.95353535]
mean value: 0.9347474747474748
key: test_fscore
value: [0.86792453 0.84375 0.78431373 0.90909091 0.96153846 0.8852459
0.84 0.94545455 0.93333333 0.92592593]
mean value: 0.8896577330774602
key: train_fscore
value: [0.93980583 0.90262172 0.94045175 0.93902439 0.93920335 0.91984733
0.92207792 0.95315682 0.94331984 0.95178197]
mean value: 0.9351290919849996
key: test_precision
value: [0.88461538 0.72972973 0.83333333 0.89285714 1. 0.81818182
0.95454545 0.96296296 0.875 0.96153846]
mean value: 0.8912764287764288
key: train_precision
value: [0.90636704 0.84265734 0.958159 0.94672131 0.97816594 0.8700361
0.99069767 0.95901639 0.94331984 0.98695652]
mean value: 0.9382097158751853
key: test_recall
value: [0.85185185 1. 0.74074074 0.92592593 0.92592593 0.96428571
0.75 0.92857143 1. 0.89285714]
mean value: 0.898015873015873
key: train_recall
value: [0.97580645 0.97177419 0.9233871 0.93145161 0.90322581 0.9757085
0.86234818 0.94736842 0.94331984 0.91902834]
mean value: 0.9353418440642549
key: test_roc_auc
value: [0.8723545 0.82142857 0.7989418 0.90939153 0.96296296 0.87103175
0.85648148 0.9457672 0.92592593 0.92791005]
mean value: 0.8892195767195767
key: train_roc_auc
value: [0.93729594 0.89479398 0.94145063 0.93941002 0.94149145 0.91527361
0.92714183 0.95352292 0.94343411 0.95346578]
mean value: 0.9347280266422882
key: test_jcc
value: [0.76666667 0.72972973 0.64516129 0.83333333 0.92592593 0.79411765
0.72413793 0.89655172 0.875 0.86206897]
mean value: 0.8052693213726715
key: train_jcc
value: [0.88644689 0.8225256 0.8875969 0.88505747 0.88537549 0.85159011
0.85542169 0.91050584 0.89272031 0.908 ]
mean value: 0.8785240284120172
MCC on Blind test: 0.4
Accuracy on Blind test: 0.94
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20455599 0.1857903 0.19286776 0.19176698 0.18908429 0.19593883
0.19246387 0.19231415 0.18916607 0.18785882]
mean value: 0.19218070507049562
key: score_time
value: [0.01529026 0.01592922 0.01679468 0.01615262 0.01680255 0.0168097
0.01640892 0.01586556 0.01548862 0.01537132]
mean value: 0.016091346740722656
key: test_mcc
value: [0.92724868 0.96428571 0.85449735 0.89139151 0.92962225 0.89602867
0.92724868 0.96428571 0.96423926 0.89153439]
mean value: 0.9210382220929846
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96363636 0.98181818 0.92727273 0.94545455 0.96363636 0.94545455
0.96363636 0.98181818 0.98181818 0.94545455]
mean value: 0.96
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.98181818 0.92592593 0.94339623 0.96153846 0.94915254
0.96428571 0.98181818 0.98245614 0.94545455]
mean value: 0.9598808882942826
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.96428571 0.92592593 0.96153846 1. 0.90322581
0.96428571 1. 0.96551724 0.96296296]
mean value: 0.9610704789792666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.92592593 0.92592593 0.92592593 1.
0.96428571 0.96428571 1. 0.92857143]
mean value: 0.9597883597883597
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96362434 0.98214286 0.92724868 0.94510582 0.96296296 0.94444444
0.96362434 0.98214286 0.98148148 0.9457672 ]
mean value: 0.9598544973544973
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.96428571 0.86206897 0.89285714 0.92592593 0.90322581
0.93103448 0.96428571 0.96551724 0.89655172]
mean value: 0.9234324146170643
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.07000065 0.07286143 0.07021427 0.07388616 0.08600783 0.09426713
0.07593942 0.0796895 0.08389831 0.09681273]
mean value: 0.08035774230957031
key: score_time
value: [0.02999687 0.02562594 0.02819896 0.02181721 0.03845429 0.03784633
0.02426839 0.0302527 0.02698541 0.0410769 ]
mean value: 0.03045229911804199
key: test_mcc
value: [0.92724868 0.89642146 0.85449735 0.8565805 0.96428571 0.92962225
0.96423926 1. 0.89139151 0.85449735]
mean value: 0.9138784079339198
key: train_mcc
value: [0.9838707 0.9838707 0.9878867 0.97980573 0.98383832 0.97172522
0.98795103 0.98387018 0.97980573 0.98387018]
mean value: 0.9826494492650794
key: test_accuracy
value: [0.96363636 0.94545455 0.92727273 0.92727273 0.98181818 0.96363636
0.98181818 1. 0.94545455 0.92727273]
mean value: 0.9563636363636363
key: train_accuracy
value: [0.99191919 0.99191919 0.99393939 0.98989899 0.99191919 0.98585859
0.99393939 0.99191919 0.98989899 0.99191919]
mean value: 0.9913131313131313
key: test_fscore
value: [0.96296296 0.94736842 0.92592593 0.92307692 0.98181818 0.96551724
0.98245614 1. 0.94736842 0.92857143]
mean value: 0.9565065646190873
key: train_fscore
value: [0.99190283 0.99190283 0.99396378 0.98993964 0.99193548 0.98585859
0.99389002 0.99186992 0.98985801 0.99186992]
mean value: 0.9912991028204244
key: test_precision
value: [0.96296296 0.9 0.92592593 0.96 0.96428571 0.93333333
0.96551724 1. 0.93103448 0.92857143]
mean value: 0.9471631089217296
key: train_precision
value: [0.99593496 0.99593496 0.99196787 0.98795181 0.99193548 0.98387097
1. 0.99591837 0.99186992 0.99591837]
mean value: 0.9931302702420014
key: test_recall
value: [0.96296296 1. 0.92592593 0.88888889 1. 1.
1. 1. 0.96428571 0.92857143]
mean value: 0.9670634920634921
key: train_recall
value: [0.98790323 0.98790323 0.99596774 0.99193548 0.99193548 0.98785425
0.98785425 0.98785425 0.98785425 0.98785425]
mean value: 0.9894916416351052
key: test_roc_auc
value: [0.96362434 0.94642857 0.92724868 0.9265873 0.98214286 0.96296296
0.98148148 1. 0.94510582 0.92724868]
mean value: 0.9562830687830688
key: train_roc_auc
value: [0.99192732 0.99192732 0.99393529 0.98989487 0.99191916 0.98586261
0.99392713 0.991911 0.98989487 0.991911 ]
mean value: 0.991311055243568
key: test_jcc
value: [0.92857143 0.9 0.86206897 0.85714286 0.96428571 0.93333333
0.96551724 1. 0.9 0.86666667]
mean value: 0.9177586206896552
key: train_jcc
value: [0.98393574 0.98393574 0.988 0.98007968 0.984 0.97211155
0.98785425 0.98387097 0.97991968 0.98387097]
mean value: 0.9827578586214413
MCC on Blind test: 0.09
Accuracy on Blind test: 0.32
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17707753 0.29563475 0.24107099 0.21363139 0.16412997 0.16844749
0.17108274 0.17586136 0.11147547 0.2053895 ]
mean value: 0.19238011837005614
key: score_time
value: [0.02453613 0.02744293 0.04643822 0.02506137 0.02915144 0.03036785
0.03013134 0.0295155 0.03101945 0.03005862]
mean value: 0.03037228584289551
key: test_mcc
value: [0.57574525 0.67729621 0.45601459 0.67602163 0.89153439 0.56556341
0.75033796 0.85449735 0.61858957 0.64402061]
mean value: 0.6709620987267373
key: train_mcc
value: [0.98795161 0.98396735 0.99195168 0.98396735 0.98396735 0.98795103
0.98396631 0.98396631 0.98396631 0.98396631]
mean value: 0.9855621621215631
key: test_accuracy
value: [0.78181818 0.83636364 0.72727273 0.83636364 0.94545455 0.78181818
0.87272727 0.92727273 0.8 0.81818182]
mean value: 0.8327272727272728
key: train_accuracy
value: [0.99393939 0.99191919 0.9959596 0.99191919 0.99191919 0.99393939
0.99191919 0.99191919 0.99191919 0.99191919]
mean value: 0.9927272727272727
key: test_fscore
value: [0.75 0.84210526 0.70588235 0.82352941 0.94545455 0.77777778
0.86792453 0.92857143 0.82539683 0.80769231]
mean value: 0.827433444105855
key: train_fscore
value: [0.99391481 0.99186992 0.99595142 0.99186992 0.99186992 0.99389002
0.99183673 0.99183673 0.99183673 0.99183673]
mean value: 0.992671293954595
key: test_precision
value: [0.85714286 0.8 0.75 0.875 0.92857143 0.80769231
0.92 0.92857143 0.74285714 0.875 ]
mean value: 0.8484835164835165
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.88888889 0.66666667 0.77777778 0.96296296 0.75
0.82142857 0.92857143 0.92857143 0.75 ]
mean value: 0.8141534391534392
key: train_recall
value: [0.98790323 0.98387097 0.99193548 0.98387097 0.98387097 0.98785425
0.98380567 0.98380567 0.98380567 0.98380567]
mean value: 0.9854528535980149
key: test_roc_auc
value: [0.7797619 0.83730159 0.72619048 0.83531746 0.9457672 0.78240741
0.87367725 0.92724868 0.79761905 0.81944444]
mean value: 0.832473544973545
key: train_roc_auc
value: [0.99395161 0.99193548 0.99596774 0.99193548 0.99193548 0.99392713
0.99190283 0.99190283 0.99190283 0.99190283]
mean value: 0.9927264267990075
key: test_jcc
value: [0.6 0.72727273 0.54545455 0.7 0.89655172 0.63636364
0.76666667 0.86666667 0.7027027 0.67741935]
mean value: 0.7119098024103586
key: train_jcc
value: [0.98790323 0.98387097 0.99193548 0.98387097 0.98387097 0.98785425
0.98380567 0.98380567 0.98380567 0.98380567]
mean value: 0.9854528535980149
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.7654438 0.7468729 0.75707912 0.74711108 0.76610827 0.76259875
0.75790429 0.75682592 0.75021052 0.74763083]
mean value: 0.7557785511016846
key: score_time
value: [0.00954056 0.01017046 0.00927854 0.00945091 0.0103693 0.00993395
0.00967669 0.00912714 0.00927448 0.00911355]
mean value: 0.009593558311462403
key: test_mcc
value: [0.92724868 0.96428571 0.85449735 0.89139151 1. 0.89602867
0.96423926 1. 0.96423926 0.89139151]
mean value: 0.9353321952904994
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96363636 0.98181818 0.92727273 0.94545455 1. 0.94545455
0.98181818 1. 0.98181818 0.94545455]
mean value: 0.9672727272727273
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.98181818 0.92592593 0.94339623 1. 0.94915254
0.98245614 1. 0.98245614 0.94736842]
mean value: 0.9675536541249432
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.96428571 0.92592593 0.96153846 1. 0.90322581
0.96551724 1. 0.96551724 0.93103448]
mean value: 0.9580007836681919
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.92592593 0.92592593 1. 1.
1. 1. 1. 0.96428571]
mean value: 0.977910052910053
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96362434 0.98214286 0.92724868 0.94510582 1. 0.94444444
0.98148148 1. 0.98148148 0.94510582]
mean value: 0.9670634920634921
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.96428571 0.86206897 0.89285714 1. 0.90322581
0.96551724 1. 0.96551724 0.9 ]
mean value: 0.9382043540441761
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.29
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03061318 0.03356743 0.03385997 0.03194618 0.03091121 0.030725
0.03083324 0.03064632 0.03090024 0.0306983 ]
mean value: 0.031470108032226565
key: score_time
value: [0.01244497 0.01246119 0.0126853 0.01307559 0.01310396 0.01295328
0.0130353 0.01303506 0.01296496 0.01314688]
mean value: 0.01289064884185791
key: test_mcc
value: [ 0.04111183 0.12729377 0.19302201 0.34263208 0.33604449 -0.13495839
0.00509277 0.06900656 0.31698002 0.1028689 ]
mean value: 0.13990940292625395
key: train_mcc
value: [0.4529978 0.39302546 0.53873258 0.55123208 0.36860407 0.29590134
0.29590134 0.40164502 0.63609111 0.37064115]
mean value: 0.4304771947958042
key: test_accuracy
value: [0.50909091 0.54545455 0.58181818 0.63636364 0.61818182 0.47272727
0.50909091 0.52727273 0.61818182 0.54545455]
mean value: 0.5563636363636363
key: train_accuracy
value: [0.67070707 0.63434343 0.72525253 0.73333333 0.62020202 0.57979798
0.57979798 0.63838384 0.78787879 0.62020202]
mean value: 0.658989898989899
key: test_fscore
value: [0.63013699 0.64788732 0.65671642 0.71428571 0.71232877 0.63291139
0.65822785 0.66666667 0.72 0.65753425]
mean value: 0.669669536331282
key: train_fscore
value: [0.75265554 0.73264402 0.78481013 0.78980892 0.7251462 0.7037037
0.7037037 0.73402675 0.82470785 0.72434018]
mean value: 0.747554697471538
key: test_precision
value: [0.5 0.52272727 0.55 0.58139535 0.56521739 0.49019608
0.50980392 0.52 0.57446809 0.53333333]
mean value: 0.5347141431308546
key: train_precision
value: [0.60340633 0.57808858 0.64583333 0.65263158 0.56880734 0.54285714
0.54285714 0.57981221 0.70170455 0.56781609]
mean value: 0.5983814285548509
key: test_recall
value: [0.85185185 0.85185185 0.81481481 0.92592593 0.96296296 0.89285714
0.92857143 0.92857143 0.96428571 0.85714286]
mean value: 0.8978835978835978
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.51521164 0.55092593 0.58597884 0.64153439 0.62433862 0.46494709
0.50132275 0.51984127 0.61177249 0.53968254]
mean value: 0.5555555555555556
key: train_roc_auc
value: [0.67004049 0.63360324 0.72469636 0.73279352 0.6194332 0.58064516
0.58064516 0.6391129 0.78830645 0.62096774]
mean value: 0.6590244220974272
key: test_jcc
value: [0.46 0.47916667 0.48888889 0.55555556 0.55319149 0.46296296
0.49056604 0.5 0.5625 0.48979592]
mean value: 0.5042627519538972
key: train_jcc
value: [0.60340633 0.57808858 0.64583333 0.65263158 0.56880734 0.54285714
0.54285714 0.57981221 0.70170455 0.56781609]
mean value: 0.5983814285548509
MCC on Blind test: -0.06
Accuracy on Blind test: 0.17
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02615452 0.03805971 0.03778291 0.03775167 0.03771639 0.03777146
0.02376747 0.03762388 0.03809237 0.03769779]
mean value: 0.03524181842803955
key: score_time
value: [0.01908135 0.01849127 0.01843929 0.0183835 0.01840234 0.01997757
0.018466 0.01841736 0.01839828 0.01835322]
mean value: 0.018641018867492677
key: test_mcc
value: [0.71588202 0.78410665 0.52935027 0.82337971 0.96423926 0.85449735
0.81878307 0.92724868 0.8565805 0.81878307]
mean value: 0.8092850574012925
key: train_mcc
value: [0.86672653 0.87081606 0.89125899 0.86667211 0.8586449 0.86274286
0.86274286 0.85457514 0.86265611 0.86673306]
mean value: 0.8663568632850653
key: test_accuracy
value: [0.85454545 0.89090909 0.76363636 0.90909091 0.98181818 0.92727273
0.90909091 0.96363636 0.92727273 0.90909091]
mean value: 0.9036363636363636
key: train_accuracy
value: [0.93333333 0.93535354 0.94545455 0.93333333 0.92929293 0.93131313
0.93131313 0.92727273 0.93131313 0.93333333]
mean value: 0.9331313131313131
key: test_fscore
value: [0.84 0.89285714 0.74509804 0.9122807 0.98113208 0.92857143
0.90909091 0.96428571 0.93103448 0.90909091]
mean value: 0.9013441403096495
key: train_fscore
value: [0.93386774 0.936 0.94632207 0.93360161 0.92985972 0.93172691
0.93172691 0.92741935 0.93145161 0.93360161]
mean value: 0.9335577524823128
key: test_precision
value: [0.91304348 0.86206897 0.79166667 0.86666667 1. 0.92857143
0.92592593 0.96428571 0.9 0.92592593]
mean value: 0.9078154771820439
key: train_precision
value: [0.92828685 0.92857143 0.93333333 0.93172691 0.92430279 0.92430279
0.92430279 0.92369478 0.92771084 0.928 ]
mean value: 0.927423251114875
key: test_recall
value: [0.77777778 0.92592593 0.7037037 0.96296296 0.96296296 0.92857143
0.89285714 0.96428571 0.96428571 0.89285714]
mean value: 0.8976190476190476
key: train_recall
value: [0.93951613 0.94354839 0.95967742 0.93548387 0.93548387 0.93927126
0.93927126 0.93117409 0.93522267 0.93927126]
mean value: 0.9397920203735144
key: test_roc_auc
value: [0.8531746 0.89153439 0.76256614 0.91005291 0.98148148 0.92724868
0.90939153 0.96362434 0.9265873 0.90939153]
mean value: 0.903505291005291
key: train_roc_auc
value: [0.93332082 0.93533695 0.94542575 0.93332898 0.9292804 0.93132918
0.93132918 0.92728059 0.93132101 0.9333453 ]
mean value: 0.9331298158547734
key: test_jcc
value: [0.72413793 0.80645161 0.59375 0.83870968 0.96296296 0.86666667
0.83333333 0.93103448 0.87096774 0.83333333]
mean value: 0.8261347742347465
key: train_jcc
value: [0.87593985 0.87969925 0.89811321 0.8754717 0.86891386 0.87218045
0.87218045 0.86466165 0.87169811 0.8754717 ]
mean value: 0.8754330228794374
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27136374 0.27113485 0.33172965 0.30581665 0.3005619 0.26949382
0.2750001 0.29058337 0.27251935 0.3191328 ]
mean value: 0.29073362350463866
key: score_time
value: [0.01856899 0.01850533 0.0185225 0.01847601 0.01860619 0.01852322
0.02130389 0.01841664 0.01855254 0.01855564]
mean value: 0.018803095817565917
key: test_mcc
value: [0.71588202 0.78410665 0.52935027 0.82337971 0.96423926 0.85449735
0.81878307 0.92724868 0.8565805 0.81878307]
mean value: 0.8092850574012925
key: train_mcc
value: [0.89916888 0.89497657 0.89125899 0.86667211 0.8586449 0.86274286
0.86274286 0.85457514 0.86265611 0.86673306]
mean value: 0.8720171491265002
key: test_accuracy
value: [0.85454545 0.89090909 0.76363636 0.90909091 0.98181818 0.92727273
0.90909091 0.96363636 0.92727273 0.90909091]
mean value: 0.9036363636363636
key: train_accuracy
value: [0.94949495 0.94747475 0.94545455 0.93333333 0.92929293 0.93131313
0.93131313 0.92727273 0.93131313 0.93333333]
mean value: 0.935959595959596
key: test_fscore
value: [0.84 0.89285714 0.74509804 0.9122807 0.98113208 0.92857143
0.90909091 0.96428571 0.93103448 0.90909091]
mean value: 0.9013441403096495
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:175: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:178: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.9500998 0.94779116 0.94632207 0.93360161 0.92985972 0.93172691
0.93172691 0.92741935 0.93145161 0.93360161]
mean value: 0.9363600754410022
key: test_precision
value: [0.91304348 0.86206897 0.79166667 0.86666667 1. 0.92857143
0.92592593 0.96428571 0.9 0.92592593]
mean value: 0.9078154771820439
key: train_precision
value: [0.94071146 0.944 0.93333333 0.93172691 0.92430279 0.92430279
0.92430279 0.92369478 0.92771084 0.928 ]
mean value: 0.9302085692438272
key: test_recall
value: [0.77777778 0.92592593 0.7037037 0.96296296 0.96296296 0.92857143
0.89285714 0.96428571 0.96428571 0.89285714]
mean value: 0.8976190476190476
key: train_recall
value: [0.95967742 0.9516129 0.95967742 0.93548387 0.93548387 0.93927126
0.93927126 0.93117409 0.93522267 0.93927126]
mean value: 0.9426146010186758
key: test_roc_auc
value: [0.8531746 0.89153439 0.76256614 0.91005291 0.98148148 0.92724868
0.90939153 0.96362434 0.9265873 0.90939153]
mean value: 0.903505291005291
key: train_roc_auc
value: [0.94947434 0.94746637 0.94542575 0.93332898 0.9292804 0.93132918
0.93132918 0.92728059 0.93132101 0.9333453 ]
mean value: 0.935958110225937
key: test_jcc
value: [0.72413793 0.80645161 0.59375 0.83870968 0.96296296 0.86666667
0.83333333 0.93103448 0.87096774 0.83333333]
mean value: 0.8261347742347465
key: train_jcc
value: [0.90494297 0.90076336 0.89811321 0.8754717 0.86891386 0.87218045
0.87218045 0.86466165 0.87169811 0.8754717 ]
mean value: 0.8804397455608106
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03542757 0.03597736 0.03544164 0.03828049 0.03607202 0.03562999
0.03738952 0.02794266 0.03553414 0.03548431]
mean value: 0.03531796932220459
key: score_time
value: [0.01272678 0.01266503 0.01269794 0.01263142 0.01262355 0.01270127
0.01296377 0.01188087 0.01294971 0.01273322]
mean value: 0.012657356262207032
key: test_mcc
value: [0.82942474 0.86189955 0.68434084 0.75808552 0.82195294 0.85933785
0.75047877 0.93094934 0.71611487 0.78571429]
mean value: 0.7998298708962035
key: train_mcc
value: [0.85809003 0.86193803 0.86590623 0.87777588 0.87024737 0.84662074
0.84651574 0.85837416 0.88199914 0.86237183]
mean value: 0.8629839142601098
key: test_accuracy
value: [0.9122807 0.92982456 0.84210526 0.87719298 0.91071429 0.92857143
0.875 0.96428571 0.85714286 0.89285714]
mean value: 0.8989974937343358
key: train_accuracy
value: [0.92899408 0.93096647 0.93293886 0.93885602 0.93503937 0.92322835
0.92322835 0.92913386 0.94094488 0.93110236]
mean value: 0.9314432589417447
key: test_fscore
value: [0.91525424 0.93103448 0.84745763 0.8852459 0.90909091 0.92592593
0.87272727 0.96296296 0.86206897 0.89285714]
mean value: 0.9004625427886199
key: train_fscore
value: [0.9296875 0.93123772 0.93307087 0.93909627 0.93567251 0.92397661
0.92367906 0.9296875 0.94140625 0.93177388]
mean value: 0.9319288166968593
key: test_precision
value: [0.87096774 0.9 0.83333333 0.84375 0.92592593 0.96153846
0.88888889 1. 0.83333333 0.89285714]
mean value: 0.895059482781257
key: train_precision
value: [0.92248062 0.92941176 0.92941176 0.93359375 0.92664093 0.91505792
0.91828794 0.92248062 0.93410853 0.92277992]
mean value: 0.925425374907558
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714
0.85714286 0.92857143 0.89285714 0.89285714]
mean value: 0.9078817733990148
key: train_recall
value: [0.93700787 0.93307087 0.93675889 0.94466403 0.94488189 0.93307087
0.92913386 0.93700787 0.9488189 0.94094488]
mean value: 0.9385359932775201
key: test_roc_auc
value: [0.91317734 0.93041872 0.84174877 0.87623153 0.91071429 0.92857143
0.875 0.96428571 0.85714286 0.89285714]
mean value: 0.8990147783251231
key: train_roc_auc
value: [0.92897825 0.93096231 0.93294638 0.93886745 0.93503937 0.92322835
0.92322835 0.92913386 0.94094488 0.93110236]
mean value: 0.931443154585914
key: test_jcc
value: [0.84375 0.87096774 0.73529412 0.79411765 0.83333333 0.86206897
0.77419355 0.92857143 0.75757576 0.80645161]
mean value: 0.820632415292945
key: train_jcc
value: [0.86861314 0.87132353 0.87453875 0.88518519 0.87912088 0.85869565
0.85818182 0.86861314 0.88929889 0.87226277]
mean value: 0.8725833753544835
MCC on Blind test: 0.31
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.7798295 0.99009943 0.76765776 0.93879628 0.81983972 0.7645843
0.95550585 0.77687287 0.81234503 0.94892621]
mean value: 0.8554456949234008
key: score_time
value: [0.01273251 0.01294208 0.01296949 0.01316237 0.01302671 0.01300812
0.0130589 0.01300597 0.01304603 0.01303864]
mean value: 0.0129990816116333
key: test_mcc
value: [0.86189955 0.82512315 0.64901478 0.79110556 0.82195294 0.89802651
0.78571429 0.85933785 0.71611487 0.85714286]
mean value: 0.8065432358778236
key: train_mcc
value: [0.88168194 0.89349683 0.88954592 0.90927764 0.89774912 0.88976378
0.88979136 0.93703692 0.90945587 0.89766562]
mean value: 0.8995465023358031
key: test_accuracy
value: [0.92982456 0.9122807 0.8245614 0.89473684 0.91071429 0.94642857
0.89285714 0.92857143 0.85714286 0.92857143]
mean value: 0.9025689223057645
key: train_accuracy
value: [0.9408284 0.94674556 0.94477318 0.95463511 0.9488189 0.94488189
0.94488189 0.96850394 0.95472441 0.9488189 ]
mean value: 0.9497612169780553
key: test_fscore
value: [0.93103448 0.9122807 0.82758621 0.9 0.9122807 0.94339623
0.89285714 0.92592593 0.86206897 0.92857143]
mean value: 0.9036001782450778
key: train_fscore
value: [0.94117647 0.94695481 0.94466403 0.95463511 0.94921875 0.94488189
0.94509804 0.96837945 0.95463511 0.94901961]
mean value: 0.9498663265993761
key: test_precision
value: [0.9 0.89655172 0.82758621 0.87096774 0.89655172 1.
0.89285714 0.96153846 0.83333333 0.92857143]
mean value: 0.9007957763408264
key: train_precision
value: [0.9375 0.94509804 0.94466403 0.95275591 0.94186047 0.94488189
0.94140625 0.97222222 0.95652174 0.9453125 ]
mean value: 0.9482223042580766
key: test_recall
value: [0.96428571 0.92857143 0.82758621 0.93103448 0.92857143 0.89285714
0.89285714 0.89285714 0.89285714 0.92857143]
mean value: 0.9080049261083745
key: train_recall
value: [0.94488189 0.9488189 0.94466403 0.95652174 0.95669291 0.94488189
0.9488189 0.96456693 0.95275591 0.95275591]
mean value: 0.9515358999097445
key: test_roc_auc
value: [0.93041872 0.91256158 0.82450739 0.89408867 0.91071429 0.94642857
0.89285714 0.92857143 0.85714286 0.92857143]
mean value: 0.9025862068965518
key: train_roc_auc
value: [0.94082039 0.94674146 0.94477296 0.95463882 0.9488189 0.94488189
0.94488189 0.96850394 0.95472441 0.9488189 ]
mean value: 0.9497603560424512
key: test_jcc
value: [0.87096774 0.83870968 0.70588235 0.81818182 0.83870968 0.89285714
0.80645161 0.86206897 0.75757576 0.86666667]
mean value: 0.8258071413417223
key: train_jcc
value: [0.88888889 0.89925373 0.89513109 0.91320755 0.90334572 0.89552239
0.89591078 0.93869732 0.91320755 0.90298507]
mean value: 0.9046150086984556
MCC on Blind test: 0.3
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01444316 0.01245856 0.01029587 0.00997543 0.0098331 0.00987601
0.00983953 0.00994825 0.00983977 0.00989795]
mean value: 0.010640764236450195
key: score_time
value: [0.0122149 0.0091784 0.00903082 0.00886178 0.00877905 0.00872421
0.00874662 0.00869894 0.00871277 0.00874066]
mean value: 0.00916881561279297
key: test_mcc
value: [0.78940887 0.57973205 0.65634573 0.58562417 0.65814518 0.58255173
0.67900461 0.65814518 0.50128041 0.72168784]
mean value: 0.6411925754198944
key: train_mcc
value: [0.67104275 0.66923233 0.64864227 0.6717949 0.66098223 0.6638126
0.64404637 0.6813177 0.66816241 0.69204983]
mean value: 0.6671083389867292
key: test_accuracy
value: [0.89473684 0.78947368 0.8245614 0.78947368 0.82142857 0.76785714
0.83928571 0.82142857 0.75 0.85714286]
mean value: 0.8155388471177945
key: train_accuracy
value: [0.83037475 0.83037475 0.82051282 0.83234714 0.82677165 0.82874016
0.81692913 0.83858268 0.83070866 0.84055118]
mean value: 0.829589293202255
key: test_fscore
value: [0.89285714 0.77777778 0.81481481 0.77777778 0.8 0.71111111
0.83636364 0.8 0.74074074 0.84615385]
mean value: 0.7997596847596848
key: train_fscore
value: [0.81465517 0.81623932 0.80513919 0.81876333 0.81276596 0.81606765
0.79913607 0.82916667 0.81779661 0.825054 ]
mean value: 0.8154783953529364
key: test_precision
value: [0.89285714 0.80769231 0.88 0.84 0.90909091 0.94117647
0.85185185 0.90909091 0.76923077 0.91666667]
mean value: 0.8717657027068791
key: train_precision
value: [0.9 0.89252336 0.87850467 0.88888889 0.88425926 0.88127854
0.88516746 0.88053097 0.8853211 0.9138756 ]
mean value: 0.8890349860913827
key: test_recall
value: [0.89285714 0.75 0.75862069 0.72413793 0.71428571 0.57142857
0.82142857 0.71428571 0.71428571 0.78571429]
mean value: 0.7447044334975369
key: train_recall
value: [0.74409449 0.7519685 0.743083 0.75889328 0.7519685 0.75984252
0.72834646 0.78346457 0.75984252 0.7519685 ]
mean value: 0.7533472347577106
key: test_roc_auc
value: [0.89470443 0.7887931 0.82573892 0.79064039 0.82142857 0.76785714
0.83928571 0.82142857 0.75 0.85714286]
mean value: 0.8157019704433498
key: train_roc_auc
value: [0.83054527 0.83052971 0.8203604 0.83220255 0.82677165 0.82874016
0.81692913 0.83858268 0.83070866 0.84055118]
mean value: 0.8295921384332887
key: test_jcc
value: [0.80645161 0.63636364 0.6875 0.63636364 0.66666667 0.55172414
0.71875 0.66666667 0.58823529 0.73333333]
mean value: 0.6692054984345847
key: train_jcc
value: [0.68727273 0.68953069 0.67383513 0.69314079 0.68458781 0.68928571
0.66546763 0.70818505 0.69175627 0.70220588]
mean value: 0.6885267694805385
MCC on Blind test: 0.33
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01014781 0.01013613 0.01016474 0.01117969 0.01014805 0.0102253
0.01012969 0.010077 0.01030874 0.01011229]
mean value: 0.01026294231414795
key: score_time
value: [0.00869703 0.00873399 0.00877452 0.01073337 0.00870085 0.00876904
0.00877881 0.00878739 0.00882244 0.00872064]
mean value: 0.008951807022094726
key: test_mcc
value: [0.8953202 0.61805122 0.64901478 0.51048128 0.67900461 0.78772636
0.67900461 0.79385662 0.53881591 0.75047877]
mean value: 0.6901754345176191
key: train_mcc
value: [0.73570695 0.71249972 0.77122983 0.73596545 0.74805469 0.73622618
0.75197433 0.74052487 0.73718664 0.77225227]
mean value: 0.7441620925818874
key: test_accuracy
value: [0.94736842 0.80701754 0.8245614 0.75438596 0.83928571 0.89285714
0.83928571 0.89285714 0.76785714 0.875 ]
mean value: 0.844047619047619
key: train_accuracy
value: [0.8678501 0.85601578 0.88560158 0.8678501 0.87401575 0.86811024
0.87598425 0.87007874 0.86811024 0.88582677]
mean value: 0.87194435384926
key: test_fscore
value: [0.94736842 0.81355932 0.82758621 0.75 0.84210526 0.88888889
0.83636364 0.88461538 0.77966102 0.87719298]
mean value: 0.8447341122414179
key: train_fscore
value: [0.8678501 0.85370741 0.88582677 0.86573146 0.8745098 0.8678501
0.87573964 0.87209302 0.86464646 0.88803089]
mean value: 0.8715985671472862
key: test_precision
value: [0.93103448 0.77419355 0.82758621 0.77777778 0.82758621 0.92307692
0.85185185 0.95833333 0.74193548 0.86206897]
mean value: 0.8475444780366916
key: train_precision
value: [0.86956522 0.86938776 0.88235294 0.87804878 0.87109375 0.86956522
0.87747036 0.85877863 0.8879668 0.87121212]
mean value: 0.8735441569425723
key: test_recall
value: [0.96428571 0.85714286 0.82758621 0.72413793 0.85714286 0.85714286
0.82142857 0.82142857 0.82142857 0.89285714]
mean value: 0.8444581280788177
key: train_recall
value: [0.86614173 0.83858268 0.88932806 0.85375494 0.87795276 0.86614173
0.87401575 0.88582677 0.84251969 0.90551181]
mean value: 0.8699775917338396
key: test_roc_auc
value: [0.9476601 0.80788177 0.82450739 0.75492611 0.83928571 0.89285714
0.83928571 0.89285714 0.76785714 0.875 ]
mean value: 0.8442118226600985
key: train_roc_auc
value: [0.86785347 0.85605023 0.88560891 0.86782235 0.87401575 0.86811024
0.87598425 0.87007874 0.86811024 0.88582677]
mean value: 0.8719460956708475
key: test_jcc
value: [0.9 0.68571429 0.70588235 0.6 0.72727273 0.8
0.71875 0.79310345 0.63888889 0.78125 ]
mean value: 0.735086170309294
key: train_jcc
value: [0.76655052 0.74475524 0.795053 0.76325088 0.77700348 0.76655052
0.77894737 0.77319588 0.76156584 0.79861111]
mean value: 0.7725483853417521
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00978541 0.01127982 0.0107739 0.01075292 0.01079726 0.01070213
0.01073289 0.01096082 0.01066756 0.00969982]
mean value: 0.010615253448486328
key: score_time
value: [0.01713634 0.01846361 0.01252007 0.01257229 0.01251316 0.01299095
0.01284051 0.01288319 0.01281261 0.01232457]
mean value: 0.013705730438232422
key: test_mcc
value: [0.69397486 0.36835853 0.33621986 0.54592083 0.39310793 0.4645821
0.50128041 0.49030429 0.53605627 0.5728919 ]
mean value: 0.49026969817834043
key: train_mcc
value: [0.68061695 0.70442556 0.66993258 0.70071825 0.70157079 0.69368798
0.68114987 0.66560714 0.70499218 0.68640255]
mean value: 0.688910385901818
key: test_accuracy
value: [0.84210526 0.68421053 0.66666667 0.77192982 0.69642857 0.73214286
0.75 0.73214286 0.76785714 0.78571429]
mean value: 0.7429197994987469
key: train_accuracy
value: [0.84023669 0.85207101 0.83431953 0.85009862 0.8503937 0.84645669
0.84055118 0.83267717 0.8523622 0.84251969]
mean value: 0.844168646818556
key: test_fscore
value: [0.82352941 0.66666667 0.65454545 0.78688525 0.69090909 0.72727273
0.74074074 0.68085106 0.76363636 0.77777778]
mean value: 0.7312814543044954
key: train_fscore
value: [0.8389662 0.8502994 0.82857143 0.84677419 0.84677419 0.84274194
0.83960396 0.83033932 0.8502994 0.83739837]
mean value: 0.8411768412067648
key: test_precision
value: [0.91304348 0.69230769 0.69230769 0.75 0.7037037 0.74074074
0.76923077 0.84210526 0.77777778 0.80769231]
mean value: 0.7688909425179448
key: train_precision
value: [0.84738956 0.86234818 0.85654008 0.86419753 0.8677686 0.86363636
0.84462151 0.84210526 0.86234818 0.86554622]
mean value: 0.8576501484027818
key: test_recall
value: [0.75 0.64285714 0.62068966 0.82758621 0.67857143 0.71428571
0.71428571 0.57142857 0.75 0.75 ]
mean value: 0.7019704433497537
key: train_recall
value: [0.83070866 0.83858268 0.80237154 0.83003953 0.82677165 0.82283465
0.83464567 0.81889764 0.83858268 0.81102362]
mean value: 0.8254458311288164
key: test_roc_auc
value: [0.84051724 0.68349754 0.66748768 0.77093596 0.69642857 0.73214286
0.75 0.73214286 0.76785714 0.78571429]
mean value: 0.7426724137931034
key: train_roc_auc
value: [0.84025552 0.85209766 0.83425664 0.85005913 0.8503937 0.84645669
0.84055118 0.83267717 0.8523622 0.84251969]
mean value: 0.8441629578911332
key: test_jcc
value: [0.7 0.5 0.48648649 0.64864865 0.52777778 0.57142857
0.58823529 0.51612903 0.61764706 0.63636364]
mean value: 0.5792716505904362
key: train_jcc
value: [0.72260274 0.73958333 0.70731707 0.73426573 0.73426573 0.728223
0.72354949 0.70989761 0.73958333 0.72027972]
mean value: 0.7259567763866404
MCC on Blind test: 0.24
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02472115 0.02486706 0.02309799 0.0221045 0.02213407 0.02268434
0.02683425 0.02616549 0.02615142 0.0264852 ]
mean value: 0.024524545669555663
key: score_time
value: [0.01321721 0.01203775 0.01212263 0.01310039 0.0120523 0.01226783
0.01339436 0.01332831 0.01342893 0.01346922]
mean value: 0.012841892242431641
key: test_mcc
value: [0.86189955 0.8953202 0.79110556 0.72064772 0.78571429 0.8660254
0.71611487 0.8660254 0.61065803 0.78571429]
mean value: 0.7899225309271085
key: train_mcc
value: [0.78700923 0.79097672 0.79093074 0.81067833 0.80324922 0.78361641
0.80709287 0.78361641 0.81889764 0.79134472]
mean value: 0.7967412266329543
key: test_accuracy
value: [0.92982456 0.94736842 0.89473684 0.85964912 0.89285714 0.92857143
0.85714286 0.92857143 0.80357143 0.89285714]
mean value: 0.893515037593985
key: train_accuracy
value: [0.89349112 0.89546351 0.89546351 0.90532544 0.9015748 0.89173228
0.90354331 0.89173228 0.90944882 0.89566929]
mean value: 0.8983444377145164
key: test_fscore
value: [0.93103448 0.94736842 0.9 0.86666667 0.89285714 0.92307692
0.85185185 0.92307692 0.81355932 0.89285714]
mean value: 0.89423488762318
key: train_fscore
value: [0.89328063 0.8962818 0.8950495 0.90551181 0.90234375 0.89278752
0.90373281 0.89278752 0.90944882 0.89587426]
mean value: 0.8987098439098706
key: test_precision
value: [0.9 0.93103448 0.87096774 0.83870968 0.89285714 1.
0.88461538 1. 0.77419355 0.89285714]
mean value: 0.8985235120830226
key: train_precision
value: [0.8968254 0.89105058 0.8968254 0.90196078 0.89534884 0.88416988
0.90196078 0.88416988 0.90944882 0.89411765]
mean value: 0.8955878017441364
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8934729064039408
key: train_recall
value: [0.88976378 0.9015748 0.89328063 0.90909091 0.90944882 0.9015748
0.90551181 0.9015748 0.90944882 0.8976378 ]
mean value: 0.9018906974572842
key: test_roc_auc
value: [0.93041872 0.9476601 0.89408867 0.85899015 0.89285714 0.92857143
0.85714286 0.92857143 0.80357143 0.89285714]
mean value: 0.893472906403941
key: train_roc_auc
value: [0.89349849 0.89545143 0.89545921 0.90533286 0.9015748 0.89173228
0.90354331 0.89173228 0.90944882 0.89566929]
mean value: 0.8983442781114811
key: test_jcc
value: [0.87096774 0.9 0.81818182 0.76470588 0.80645161 0.85714286
0.74193548 0.85714286 0.68571429 0.80645161]
mean value: 0.8108694152147662
key: train_jcc
value: [0.80714286 0.81205674 0.81003584 0.82733813 0.82206406 0.80633803
0.82437276 0.80633803 0.83393502 0.8113879 ]
mean value: 0.8161009358062393
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.05455732 1.96423841 2.07713985 2.01222348 1.87226844 1.90104604
2.04310274 2.02595901 1.96324086 1.9570272 ]
mean value: 1.9870803356170654
key: score_time
value: [0.01421905 0.01303458 0.01450396 0.01240849 0.01312137 0.01311612
0.01234913 0.01241875 0.01241255 0.01332283]
mean value: 0.013090682029724122
key: test_mcc
value: [0.79161589 0.75462449 0.79110556 0.75462449 0.78772636 0.85714286
0.71611487 0.8660254 0.64450339 0.78772636]
mean value: 0.7751209688559202
key: train_mcc
value: [0.98425123 0.98823457 0.99211042 0.98034517 0.99215674 1.
0.98819663 0.98825791 0.99212598 0.99607071]
mean value: 0.9901749372582905
key: test_accuracy
value: [0.89473684 0.87719298 0.89473684 0.87719298 0.89285714 0.92857143
0.85714286 0.92857143 0.82142857 0.89285714]
mean value: 0.8865288220551378
key: train_accuracy
value: [0.99211045 0.99408284 0.99605523 0.99013807 0.99606299 1.
0.99409449 0.99409449 0.99606299 0.9980315 ]
mean value: 0.9950733044464116
key: test_fscore
value: [0.89655172 0.87272727 0.9 0.88135593 0.89655172 0.92857143
0.86206897 0.92307692 0.82758621 0.88888889]
mean value: 0.8877379066157558
key: train_fscore
value: [0.99215686 0.99412916 0.99604743 0.99017682 0.99604743 1.
0.99410609 0.99405941 0.99606299 0.99802761]
mean value: 0.9950813802058787
key: test_precision
value: [0.86666667 0.88888889 0.87096774 0.86666667 0.86666667 0.92857143
0.83333333 1. 0.8 0.92307692]
mean value: 0.8844838315806058
key: train_precision
value: [0.98828125 0.98832685 0.99604743 0.984375 1. 1.
0.99215686 1. 0.99606299 1. ]
mean value: 0.9945250383950149
key: test_recall
value: [0.92857143 0.85714286 0.93103448 0.89655172 0.92857143 0.92857143
0.89285714 0.85714286 0.85714286 0.85714286]
mean value: 0.8934729064039408
key: train_recall
value: [0.99606299 1. 0.99604743 0.99604743 0.99212598 1.
0.99606299 0.98818898 0.99606299 0.99606299]
mean value: 0.9956661790793937
key: test_roc_auc
value: [0.8953202 0.87684729 0.89408867 0.87684729 0.89285714 0.92857143
0.85714286 0.92857143 0.82142857 0.89285714]
mean value: 0.8864532019704434
key: train_roc_auc
value: [0.99210264 0.99407115 0.99605521 0.9901497 0.99606299 1.
0.99409449 0.99409449 0.99606299 0.9980315 ]
mean value: 0.9950725156391024
key: test_jcc
value: [0.8125 0.77419355 0.81818182 0.78787879 0.8125 0.86666667
0.75757576 0.85714286 0.70588235 0.8 ]
mean value: 0.7992521788774161
key: train_jcc
value: [0.9844358 0.98832685 0.99212598 0.98054475 0.99212598 1.
0.98828125 0.98818898 0.99215686 0.99606299]
mean value: 0.9902249442749081
MCC on Blind test: 0.23
Accuracy on Blind test: 0.63
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02935839 0.02018619 0.0232408 0.02205086 0.02052236 0.023561
0.02732038 0.02198553 0.02232289 0.02229857]
mean value: 0.02328469753265381
key: score_time
value: [0.01187396 0.00923061 0.00890207 0.00886345 0.00889254 0.00902057
0.0091393 0.00891209 0.00899506 0.00882149]
mean value: 0.00926511287689209
key: test_mcc
value: [0.93202124 0.8951918 0.92980296 0.8951918 0.82195294 0.96490128
0.71611487 0.78772636 0.89342711 0.85933785]
mean value: 0.8695668223710578
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.94736842 0.96491228 0.94736842 0.91071429 0.98214286
0.85714286 0.89285714 0.94642857 0.92857143]
mean value: 0.9342418546365915
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.94545455 0.96551724 0.94915254 0.90909091 0.98181818
0.86206897 0.89655172 0.94736842 0.93103448]
mean value: 0.9351019976545216
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.96551724 0.93333333 0.92592593 1.
0.83333333 0.86666667 0.93103448 0.9 ]
mean value: 0.9318773946360154
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.92857143 0.96551724 0.96551724 0.89285714 0.96428571
0.89285714 0.92857143 0.96428571 0.96428571]
mean value: 0.9395320197044336
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.94704433 0.96490148 0.94704433 0.91071429 0.98214286
0.85714286 0.89285714 0.94642857 0.92857143]
mean value: 0.9341133004926109
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.89655172 0.93333333 0.90322581 0.83333333 0.96428571
0.75757576 0.8125 0.9 0.87096774]
mean value: 0.8800344839624595
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.47
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12648606 0.1256876 0.12501502 0.1250596 0.12481833 0.12559652
0.1252358 0.12575173 0.1248858 0.12514901]
mean value: 0.1253685474395752
key: score_time
value: [0.01796365 0.01816535 0.01793432 0.01795077 0.01797152 0.0180378
0.01830125 0.01778531 0.01789355 0.01786304]
mean value: 0.017986655235290527
key: test_mcc
value: [0.8953202 0.71921182 0.82490815 0.72064772 0.71428571 0.96490128
0.82195294 0.8660254 0.68250015 0.78571429]
mean value: 0.79954676684031
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.85964912 0.9122807 0.85964912 0.85714286 0.98214286
0.91071429 0.92857143 0.83928571 0.89285714]
mean value: 0.8989661654135338
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.85714286 0.91525424 0.86666667 0.85714286 0.98181818
0.9122807 0.92307692 0.84745763 0.89285714]
mean value: 0.9001065615918425
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.85714286 0.9 0.83870968 0.85714286 1.
0.89655172 1. 0.80645161 0.89285714]
mean value: 0.8979890354361989
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.85714286 0.93103448 0.89655172 0.85714286 0.96428571
0.92857143 0.85714286 0.89285714 0.89285714]
mean value: 0.9041871921182266
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9476601 0.85960591 0.91194581 0.85899015 0.85714286 0.98214286
0.91071429 0.92857143 0.83928571 0.89285714]
mean value: 0.8988916256157635
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.75 0.84375 0.76470588 0.75 0.96428571
0.83870968 0.85714286 0.73529412 0.80645161]
mean value: 0.8210339861751152
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.31
Accuracy on Blind test: 0.72
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01171422 0.01144671 0.01151061 0.01173711 0.01158261 0.01144481
0.01164675 0.01174784 0.01157475 0.01150846]
mean value: 0.011591386795043946
key: score_time
value: [0.00953698 0.00944829 0.00955725 0.00954795 0.0095222 0.00948381
0.00958061 0.00948858 0.00951004 0.00953937]
mean value: 0.00952150821685791
key: test_mcc
value: [0.69397486 0.54592083 0.47413793 0.54759338 0.39310793 0.75434227
0.5 0.5 0.60753044 0.39513166]
mean value: 0.5411739308965824
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.77192982 0.73684211 0.77192982 0.69642857 0.875
0.75 0.75 0.80357143 0.69642857]
mean value: 0.7694235588972431
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.75471698 0.73684211 0.76363636 0.70175439 0.86792453
0.75 0.75 0.80701754 0.67924528]
mean value: 0.7634666602941619
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.8 0.75 0.80769231 0.68965517 0.92
0.75 0.75 0.79310345 0.72 ]
mean value: 0.7893494406642833
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.71428571 0.72413793 0.72413793 0.71428571 0.82142857
0.75 0.75 0.82142857 0.64285714]
mean value: 0.7412561576354679
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.84051724 0.77093596 0.73706897 0.77278325 0.69642857 0.875
0.75 0.75 0.80357143 0.69642857]
mean value: 0.7692733990147783
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.60606061 0.58333333 0.61764706 0.54054054 0.76666667
0.6 0.6 0.67647059 0.51428571]
mean value: 0.6205004507945684
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.68
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.0131824 1.91670203 1.87928843 1.87886953 1.83877921 1.85278034
1.86355448 1.92332649 1.94182944 1.91238904]
mean value: 1.9020701408386231
key: score_time
value: [0.10155416 0.09879255 0.09464073 0.09240246 0.09679675 0.14493012
0.09252048 0.10167122 0.09474611 0.10077143]
mean value: 0.10188260078430175
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.82512315 0.89342711 1.
0.85933785 0.92857143 0.78772636 0.96490128]
mean value: 0.9049685794519872
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.9122807 0.94642857 1.
0.92857143 0.96428571 0.89285714 0.98214286]
mean value: 0.9521303258145364
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.9122807 0.94545455 1.
0.93103448 0.96428571 0.89655172 0.98181818]
mean value: 0.9526129194459503
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96551724 0.92857143 0.96296296 1.
0.9 0.96428571 0.86666667 1. ]
mean value: 0.9519038496624703
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.89655172 0.92857143 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.954064039408867
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.91256158 0.94642857 1.
0.92857143 0.96428571 0.89285714 0.98214286]
mean value: 0.9521551724137932
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.83870968 0.89655172 1.
0.87096774 0.93103448 0.8125 0.96428571]
mean value: 0.9111668388156152
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.97926283 0.95430398 0.98551869 0.96115994 1.1477139 0.9665091
0.97480011 0.94035792 0.9657414 1.04465652]
mean value: 0.9920024394989013
key: score_time
value: [0.26140761 0.27186894 0.21786308 0.26194715 0.22197795 0.17087865
0.23330307 0.19104862 0.26959944 0.21822429]
mean value: 0.23181188106536865
key: test_mcc
value: [0.96547546 0.8953202 0.8953202 0.85960591 0.89342711 1.
0.82195294 0.89342711 0.72168784 0.96490128]
mean value: 0.8911118047885251
key: train_mcc
value: [0.95269145 0.9605814 0.94872473 0.95661511 0.95278544 0.95278544
0.96062992 0.95278544 0.95670033 0.95278544]
mean value: 0.9547084708624481
key: test_accuracy
value: [0.98245614 0.94736842 0.94736842 0.92982456 0.94642857 1.
0.91071429 0.94642857 0.85714286 0.98214286]
mean value: 0.9449874686716792
key: train_accuracy
value: [0.97633136 0.98027613 0.97435897 0.97830375 0.97637795 0.97637795
0.98031496 0.97637795 0.97834646 0.97637795]
mean value: 0.9773443445308981
key: test_fscore
value: [0.98181818 0.94736842 0.94736842 0.93103448 0.94545455 1.
0.9122807 0.94545455 0.86666667 0.98181818]
mean value: 0.9459264147830391
key: train_fscore
value: [0.97647059 0.98039216 0.97425743 0.97830375 0.97647059 0.97647059
0.98031496 0.97647059 0.978389 0.97647059]
mean value: 0.9774010229981591
key: test_precision
value: [1. 0.93103448 0.96428571 0.93103448 0.96296296 1.
0.89655172 0.96296296 0.8125 1. ]
mean value: 0.9461332329866813
key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.97265625 0.9765625 0.97619048 0.97637795 0.97265625 0.97265625
0.98031496 0.97265625 0.97647059 0.97265625]
mean value: 0.9749197727811597
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1.
0.92857143 0.92857143 0.92857143 0.96428571]
mean value: 0.9469211822660099
key: train_recall
value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98031496 0.98031496
0.98031496 0.98031496 0.98031496 0.98031496]
mean value: 0.979902586287386
key: test_roc_auc
value: [0.98214286 0.9476601 0.9476601 0.92980296 0.94642857 1.
0.91071429 0.94642857 0.85714286 0.98214286]
mean value: 0.945012315270936
key: train_roc_auc
value: [0.97632349 0.98026828 0.97435498 0.97830755 0.97637795 0.97637795
0.98031496 0.97637795 0.97834646 0.97637795]
mean value: 0.9773427531044786
key: test_jcc
value: [0.96428571 0.9 0.9 0.87096774 0.89655172 1.
0.83870968 0.89655172 0.76470588 0.96428571]
mean value: 0.8996058178555071
key: train_jcc
value: [0.95402299 0.96153846 0.94980695 0.95752896 0.95402299 0.95402299
0.96138996 0.95402299 0.95769231 0.95402299]
mean value: 0.9558071580485373
MCC on Blind test: 0.26
Accuracy on Blind test: 0.6
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02486706 0.01062918 0.01203179 0.01114845 0.01061964 0.01092672
0.01033497 0.01076055 0.01072574 0.01054311]
mean value: 0.01225872039794922
key: score_time
value: [0.01115727 0.00907111 0.00894833 0.00886679 0.00894332 0.00974488
0.00890231 0.00917554 0.0092206 0.00940967]
mean value: 0.009343981742858887
key: test_mcc
value: [0.8953202 0.61805122 0.64901478 0.51048128 0.67900461 0.78772636
0.67900461 0.79385662 0.53881591 0.75047877]
mean value: 0.6901754345176191
key: train_mcc
value: [0.73570695 0.71249972 0.77122983 0.73596545 0.74805469 0.73622618
0.75197433 0.74052487 0.73718664 0.77225227]
mean value: 0.7441620925818874
key: test_accuracy
value: [0.94736842 0.80701754 0.8245614 0.75438596 0.83928571 0.89285714
0.83928571 0.89285714 0.76785714 0.875 ]
mean value: 0.844047619047619
key: train_accuracy
value: [0.8678501 0.85601578 0.88560158 0.8678501 0.87401575 0.86811024
0.87598425 0.87007874 0.86811024 0.88582677]
mean value: 0.87194435384926
key: test_fscore
value: [0.94736842 0.81355932 0.82758621 0.75 0.84210526 0.88888889
0.83636364 0.88461538 0.77966102 0.87719298]
mean value: 0.8447341122414179
key: train_fscore
value: [0.8678501 0.85370741 0.88582677 0.86573146 0.8745098 0.8678501
0.87573964 0.87209302 0.86464646 0.88803089]
mean value: 0.8715985671472862
key: test_precision
value: [0.93103448 0.77419355 0.82758621 0.77777778 0.82758621 0.92307692
0.85185185 0.95833333 0.74193548 0.86206897]
mean value: 0.8475444780366916
key: train_precision
value: [0.86956522 0.86938776 0.88235294 0.87804878 0.87109375 0.86956522
0.87747036 0.85877863 0.8879668 0.87121212]
mean value: 0.8735441569425723
key: test_recall
value: [0.96428571 0.85714286 0.82758621 0.72413793 0.85714286 0.85714286
0.82142857 0.82142857 0.82142857 0.89285714]
mean value: 0.8444581280788177
key: train_recall
value: [0.86614173 0.83858268 0.88932806 0.85375494 0.87795276 0.86614173
0.87401575 0.88582677 0.84251969 0.90551181]
mean value: 0.8699775917338396
key: test_roc_auc
value: [0.9476601 0.80788177 0.82450739 0.75492611 0.83928571 0.89285714
0.83928571 0.89285714 0.76785714 0.875 ]
mean value: 0.8442118226600985
key: train_roc_auc
value: [0.86785347 0.85605023 0.88560891 0.86782235 0.87401575 0.86811024
0.87598425 0.87007874 0.86811024 0.88582677]
mean value: 0.8719460956708475
key: test_jcc
value: [0.9 0.68571429 0.70588235 0.6 0.72727273 0.8
0.71875 0.79310345 0.63888889 0.78125 ]
mean value: 0.735086170309294
key: train_jcc
value: [0.76655052 0.74475524 0.795053 0.76325088 0.77700348 0.76655052
0.77894737 0.77319588 0.76156584 0.79861111]
mean value: 0.7725483853417521
MCC on Blind test: 0.28
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09275723 0.24454331 0.06994462 0.07444978 0.06929159 0.0760982
0.07451367 0.07683897 0.07969928 0.07621717]
mean value: 0.09343538284301758
key: score_time
value: [0.01097846 0.01088572 0.01100826 0.01090145 0.01062918 0.01087093
0.0107832 0.01081181 0.01131725 0.01134419]
mean value: 0.010953044891357422
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.92980296 0.85714286 1.
0.89802651 0.89342711 0.85933785 0.96490128]
mean value: 0.9227719933358153
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.92857143 1.
0.94642857 0.94642857 0.92857143 0.98214286]
mean value: 0.9609335839598997
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96551724 0.92857143 1.
0.94915254 0.94736842 0.93103448 0.98181818]
mean value: 0.9615083435436261
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.96551724 0.92857143 1.
0.90322581 0.93103448 0.9 1. ]
mean value: 0.9558151914825997
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.9681034482758621
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96490148 0.92857143 1.
0.94642857 0.94642857 0.92857143 0.98214286]
mean value: 0.9608990147783252
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93333333 0.86666667 1.
0.90322581 0.9 0.87096774 0.96428571]
mean value: 0.926713279305048
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.34
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04199457 0.06921983 0.0511663 0.08315659 0.06977725 0.07992172
0.07494569 0.07559061 0.07782745 0.07561278]
mean value: 0.06992127895355224
key: score_time
value: [0.02025461 0.01197267 0.01904845 0.01850629 0.01926947 0.01906586
0.01851892 0.01858687 0.02082109 0.01868105]
mean value: 0.0184725284576416
key: test_mcc
value: [0.82942474 0.86189955 0.82490815 0.68736396 0.82195294 0.85933785
0.75047877 0.85933785 0.64450339 0.75047877]
mean value: 0.7889685968126741
key: train_mcc
value: [0.90543486 0.89754406 0.88566582 0.91321465 0.90576456 0.89774912
0.89766562 0.90163769 0.89788834 0.89774912]
mean value: 0.9000313853409169
key: test_accuracy
value: [0.9122807 0.92982456 0.9122807 0.84210526 0.91071429 0.92857143
0.875 0.92857143 0.82142857 0.875 ]
mean value: 0.893577694235589
key: train_accuracy
value: [0.95266272 0.94871795 0.94280079 0.9566075 0.95275591 0.9488189
0.9488189 0.9507874 0.9488189 0.9488189 ]
mean value: 0.9499607852272903
key: test_fscore
value: [0.91525424 0.93103448 0.91525424 0.85245902 0.9122807 0.92592593
0.87272727 0.92592593 0.82758621 0.87272727]
mean value: 0.8951175279685669
key: train_fscore
value: [0.953125 0.94921875 0.94302554 0.95652174 0.95330739 0.94921875
0.94901961 0.95107632 0.94941634 0.94921875]
mean value: 0.9503148193596516
key: test_precision
value: [0.87096774 0.9 0.9 0.8125 0.89655172 0.96153846
0.88888889 0.96153846 0.8 0.88888889]
mean value: 0.8880874166928115
key: train_precision
value: [0.94573643 0.94186047 0.9375 0.95652174 0.94230769 0.94186047
0.9453125 0.94552529 0.93846154 0.94186047]
mean value: 0.9436946591185824
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 0.89285714
0.85714286 0.89285714 0.85714286 0.85714286]
mean value: 0.9041871921182266
key: train_recall
value: [0.96062992 0.95669291 0.9486166 0.95652174 0.96456693 0.95669291
0.95275591 0.95669291 0.96062992 0.95669291]
mean value: 0.957049267062961
key: test_roc_auc
value: [0.91317734 0.93041872 0.91194581 0.841133 0.91071429 0.92857143
0.875 0.92857143 0.82142857 0.875 ]
mean value: 0.8935960591133005
key: train_roc_auc
value: [0.95264698 0.94870219 0.94281224 0.95660733 0.95275591 0.9488189
0.9488189 0.9507874 0.9488189 0.9488189 ]
mean value: 0.9499587625657465
key: test_jcc
value: [0.84375 0.87096774 0.84375 0.74285714 0.83870968 0.86206897
0.77419355 0.86206897 0.70588235 0.77419355]
mean value: 0.8118441942961834
key: train_jcc
value: [0.91044776 0.90334572 0.89219331 0.91666667 0.91078067 0.90334572
0.90298507 0.90671642 0.9037037 0.90334572]
mean value: 0.9053530776518071
MCC on Blind test: 0.21
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01435184 0.01032829 0.01071692 0.01100564 0.00968957 0.00969958
0.00994968 0.0099864 0.00970316 0.00968027]
mean value: 0.010511136054992676
key: score_time
value: [0.01026058 0.00964212 0.00945568 0.00853848 0.00872827 0.00901937
0.00866747 0.00857186 0.0086143 0.00862527]
mean value: 0.009012341499328613
key: test_mcc
value: [0.82942474 0.79161589 0.78940887 0.61453202 0.68250015 0.8660254
0.64285714 0.79385662 0.5728919 0.75047877]
mean value: 0.7333591505072501
key: train_mcc
value: [0.75542311 0.7321357 0.73964396 0.72420838 0.75989552 0.74024928
0.75202096 0.76377953 0.73264695 0.77559656]
mean value: 0.7475599944964467
key: test_accuracy
value: [0.9122807 0.89473684 0.89473684 0.80701754 0.83928571 0.92857143
0.82142857 0.89285714 0.78571429 0.875 ]
mean value: 0.8651629072681705
key: train_accuracy
value: [0.87771203 0.86587771 0.86982249 0.86193294 0.87992126 0.87007874
0.87598425 0.88188976 0.86614173 0.88779528]
mean value: 0.8737156191274907
key: test_fscore
value: [0.91525424 0.89655172 0.89655172 0.80701754 0.83018868 0.92307692
0.82142857 0.88461538 0.79310345 0.87719298]
mean value: 0.8644981218521811
key: train_fscore
value: [0.87795276 0.864 0.86956522 0.85943775 0.88062622 0.86904762
0.87524752 0.88188976 0.864 0.88757396]
mean value: 0.8729340819469472
key: test_precision
value: [0.87096774 0.86666667 0.89655172 0.82142857 0.88 1.
0.82142857 0.95833333 0.76666667 0.86206897]
mean value: 0.8744112241114466
key: train_precision
value: [0.87795276 0.87804878 0.86956522 0.87346939 0.87548638 0.876
0.88047809 0.88188976 0.87804878 0.88932806]
mean value: 0.8780267218020522
key: test_recall
value: [0.96428571 0.92857143 0.89655172 0.79310345 0.78571429 0.85714286
0.82142857 0.82142857 0.82142857 0.89285714]
mean value: 0.8582512315270936
key: train_recall
value: [0.87795276 0.8503937 0.86956522 0.8458498 0.88582677 0.86220472
0.87007874 0.88188976 0.8503937 0.88582677]
mean value: 0.8679981948896704
key: test_roc_auc
value: [0.91317734 0.8953202 0.89470443 0.80726601 0.83928571 0.92857143
0.82142857 0.89285714 0.78571429 0.875 ]
mean value: 0.865332512315271
key: train_roc_auc
value: [0.87771156 0.86590831 0.86982198 0.86190128 0.87992126 0.87007874
0.87598425 0.88188976 0.86614173 0.88779528]
mean value: 0.8737154150197628
key: test_jcc
value: [0.84375 0.8125 0.8125 0.67647059 0.70967742 0.85714286
0.6969697 0.79310345 0.65714286 0.78125 ]
mean value: 0.7640506867121406
key: train_jcc
value: [0.78245614 0.76056338 0.76923077 0.75352113 0.78671329 0.76842105
0.77816901 0.78873239 0.76056338 0.79787234]
mean value: 0.7746242885126692
MCC on Blind test: 0.3
Accuracy on Blind test: 0.73
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0148561 0.02070785 0.02067232 0.0191505 0.02243805 0.02403593
0.02616668 0.02317834 0.01781106 0.01990294]
mean value: 0.02089197635650635
key: score_time
value: [0.00977182 0.01106358 0.01167631 0.0116396 0.01164293 0.01165843
0.01170874 0.01168728 0.01164556 0.01165771]
mean value: 0.011415195465087891
key: test_mcc
value: [0.79161589 0.79161589 0.77728159 0.82490815 0.73127242 0.92857143
0.78772636 0.85714286 0.68250015 0.68250015]
mean value: 0.7855134886225912
key: train_mcc
value: [0.83859222 0.88998239 0.77868981 0.8606491 0.82347315 0.89001213
0.87673787 0.85625065 0.89454004 0.83839667]
mean value: 0.854732402199318
key: test_accuracy
value: [0.89473684 0.89473684 0.87719298 0.9122807 0.85714286 0.96428571
0.89285714 0.92857143 0.83928571 0.83928571]
mean value: 0.8900375939849624
key: train_accuracy
value: [0.91913215 0.94477318 0.8816568 0.92899408 0.90551181 0.94488189
0.93700787 0.92716535 0.94685039 0.91732283]
mean value: 0.9253296370498066
key: test_fscore
value: [0.89655172 0.89655172 0.89230769 0.91525424 0.84 0.96428571
0.89655172 0.92857143 0.84745763 0.84745763]
mean value: 0.8924989499104052
key: train_fscore
value: [0.91816367 0.94573643 0.89208633 0.92592593 0.89655172 0.94552529
0.93939394 0.92952381 0.94567404 0.92105263]
mean value: 0.9259633804353411
key: test_precision
value: [0.86666667 0.86666667 0.80555556 0.9 0.95454545 0.96428571
0.86666667 0.92857143 0.80645161 0.80645161]
mean value: 0.8765861378764604
key: train_precision
value: [0.93117409 0.93129771 0.81848185 0.96566524 0.99047619 0.93461538
0.90510949 0.900369 0.96707819 0.88129496]
mean value: 0.9225562104390707
key: test_recall
value: [0.92857143 0.92857143 1. 0.93103448 0.75 0.96428571
0.92857143 0.92857143 0.89285714 0.89285714]
mean value: 0.9145320197044335
key: train_recall
value: [0.90551181 0.96062992 0.98023715 0.88932806 0.81889764 0.95669291
0.97637795 0.96062992 0.92519685 0.96456693]
mean value: 0.9338069154399178
key: test_roc_auc
value: [0.8953202 0.8953202 0.875 0.91194581 0.85714286 0.96428571
0.89285714 0.92857143 0.83928571 0.83928571]
mean value: 0.8899014778325123
key: train_roc_auc
value: [0.91915907 0.94474184 0.88185086 0.928916 0.90551181 0.94488189
0.93700787 0.92716535 0.94685039 0.91732283]
mean value: 0.9253407923811895
key: test_jcc
value: [0.8125 0.8125 0.80555556 0.84375 0.72413793 0.93103448
0.8125 0.86666667 0.73529412 0.73529412]
mean value: 0.8079232871309443
key: train_jcc
value: [0.84870849 0.89705882 0.80519481 0.86206897 0.8125 0.89667897
0.88571429 0.8683274 0.89694656 0.85365854]
mean value: 0.8626856837436376
MCC on Blind test: 0.29
Accuracy on Blind test: 0.71
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01960063 0.0234971 0.02287674 0.0235045 0.01932335 0.01994562
0.02235985 0.02164769 0.02076983 0.02375412]
mean value: 0.021727943420410158
key: score_time
value: [0.01173115 0.0117259 0.0115416 0.01154709 0.01157951 0.01162434
0.01164246 0.0116179 0.01165915 0.01167464]
mean value: 0.011634373664855957
key: test_mcc
value: [0.8951918 0.50752605 0.73477227 0.51004294 0.78772636 0.79385662
0.78571429 0.82618439 0.67900461 0.78772636]
mean value: 0.7307745676699011
key: train_mcc
value: [0.81207793 0.62496305 0.88020472 0.69213593 0.82633424 0.83976368
0.87159992 0.83520301 0.85017081 0.84267432]
mean value: 0.8075127612144095
key: test_accuracy
value: [0.94736842 0.71929825 0.85964912 0.73684211 0.89285714 0.89285714
0.89285714 0.91071429 0.83928571 0.89285714]
mean value: 0.8584586466165414
key: train_accuracy
value: [0.90335306 0.78106509 0.93885602 0.82642998 0.90944882 0.91732283
0.93503937 0.91338583 0.92125984 0.91929134]
mean value: 0.8965452173507897
key: test_fscore
value: [0.94545455 0.77142857 0.875 0.7826087 0.89655172 0.9
0.89285714 0.91525424 0.83636364 0.89655172]
mean value: 0.8712070277320068
key: train_fscore
value: [0.89770355 0.82067851 0.94095238 0.85084746 0.91512915 0.92164179
0.93690249 0.91911765 0.91561181 0.92307692]
mean value: 0.9041661713849551
key: test_precision
value: [0.96296296 0.64285714 0.8 0.675 0.86666667 0.84375
0.89285714 0.87096774 0.85185185 0.86666667]
mean value: 0.8273580175797918
key: train_precision
value: [0.95555556 0.69589041 0.90808824 0.74480712 0.86111111 0.87588652
0.91078067 0.86206897 0.98636364 0.88172043]
mean value: 0.8682272660537491
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 0.96428571
0.89285714 0.96428571 0.82142857 0.92857143]
mean value: 0.9289408866995074
key: train_recall
value: [0.84645669 1. 0.97628458 0.99209486 0.97637795 0.97244094
0.96456693 0.98425197 0.85433071 0.96850394]
mean value: 0.9535308580498584
key: test_roc_auc
value: [0.94704433 0.72352217 0.85775862 0.73337438 0.89285714 0.89285714
0.89285714 0.91071429 0.83928571 0.89285714]
mean value: 0.8583128078817734
key: train_roc_auc
value: [0.9034655 0.78063241 0.93892969 0.82675609 0.90944882 0.91732283
0.93503937 0.91338583 0.92125984 0.91929134]
mean value: 0.8965531729482431
key: test_jcc
value: [0.89655172 0.62790698 0.77777778 0.64285714 0.8125 0.81818182
0.80645161 0.84375 0.71875 0.8125 ]
mean value: 0.7757227052602081
key: train_jcc
value: [0.81439394 0.69589041 0.88848921 0.74041298 0.84353741 0.85467128
0.88129496 0.85034014 0.84435798 0.85714286]
mean value: 0.8270531167459525
MCC on Blind test: 0.24
Accuracy on Blind test: 0.56
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20300555 0.18559599 0.18765807 0.18791032 0.18824935 0.1863656
0.18636107 0.18714428 0.18626761 0.18867087]
mean value: 0.18872287273406982
key: score_time
value: [0.01514959 0.01522899 0.01563787 0.01534891 0.01529288 0.01523805
0.01523829 0.01545191 0.01530719 0.01528692]
mean value: 0.015318059921264648
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.96547546 0.89342711 0.93094934
0.89802651 0.85714286 0.89342711 0.96490128]
mean value: 0.9228431033982084
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.98245614 0.94642857 0.96428571
0.94642857 0.92857143 0.94642857 0.98214286]
mean value: 0.9609022556390977
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.98305085 0.94736842 0.96296296
0.94915254 0.92857143 0.94736842 0.98181818]
mean value: 0.9611913942771552
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.96666667 0.93103448 1.
0.90322581 0.92857143 0.93103448 1. ]
mean value: 0.9590335822871974
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.96428571 0.92857143
1. 0.92857143 0.96428571 0.96428571]
mean value: 0.9644088669950739
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.98214286 0.94642857 0.96428571
0.94642857 0.92857143 0.94642857 0.98214286]
mean value: 0.9608374384236454
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.96666667 0.9 0.92857143
0.90322581 0.86666667 0.9 0.96428571]
mean value: 0.9258069813019758
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.39
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06400371 0.07849145 0.07418036 0.06228375 0.07443714 0.07705092
0.08038139 0.06570339 0.08440733 0.08470249]
mean value: 0.07456419467926026
key: score_time
value: [0.02218103 0.02349353 0.03681755 0.0226469 0.02388501 0.02647114
0.02291846 0.03203106 0.03182936 0.02605414]
mean value: 0.02683281898498535
key: test_mcc
value: [0.96547546 0.92980296 0.96551724 0.93202124 0.89342711 1.
0.93094934 0.89342711 0.89802651 0.96490128]
mean value: 0.937354824745975
key: train_mcc
value: [1. 0.99606299 0.97239383 0.99606299 0.98428248 0.99215674
0.98032256 0.98819663 0.99212598 0.99215674]
mean value: 0.9893760958242528
key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 0.96491228 0.94642857 1.
0.96428571 0.94642857 0.94642857 0.98214286]
mean value: 0.9680451127819548
key: train_accuracy
value: [1. 0.99802761 0.98619329 0.99802761 0.99212598 0.99606299
0.99015748 0.99409449 0.99606299 0.99606299]
mean value: 0.9946815449843918
key: test_fscore
value: [0.98181818 0.96428571 0.98245614 0.96666667 0.94736842 1.
0.96551724 0.94736842 0.94915254 0.98181818]
mean value: 0.9686451510797076
key: train_fscore
value: [1. 0.99802761 0.98613861 0.99802761 0.99215686 0.99607843
0.99017682 0.99408284 0.99606299 0.99607843]
mean value: 0.9946830215827512
key: test_precision
value: [1. 0.96428571 1. 0.93548387 0.93103448 1.
0.93333333 0.93103448 0.90322581 1. ]
mean value: 0.9598397690555643
key: train_precision
value: [1. 1. 0.98809524 0.99606299 0.98828125 0.9921875
0.98823529 0.99604743 0.99606299 0.9921875 ]
mean value: 0.9937160197294893
key: test_recall
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[0.96428571 0.96428571 0.96551724 1. 0.96428571 1.
1. 0.96428571 1. 0.96428571]
mean value: 0.9786945812807882
key: train_recall
value: [1. 0.99606299 0.98418972 1. 0.99606299 1.
0.99212598 0.99212598 0.99606299 1. ]
mean value: 0.9956630668202048
key: test_roc_auc
value: [0.98214286 0.96490148 0.98275862 0.96428571 0.94642857 1.
0.96428571 0.94642857 0.94642857 0.98214286]
mean value: 0.9679802955665026
key: train_roc_auc
value: [1. 0.9980315 0.98618935 0.9980315 0.99212598 0.99606299
0.99015748 0.99409449 0.99606299 0.99606299]
mean value: 0.9946819271108898
key: test_jcc
value: [0.96428571 0.93103448 0.96551724 0.93548387 0.9 1.
0.93333333 0.9 0.90322581 0.96428571]
mean value: 0.9397166163462047
key: train_jcc
value: [1. 0.99606299 0.97265625 0.99606299 0.9844358 0.9921875
0.98054475 0.98823529 0.99215686 0.9921875 ]
mean value: 0.9894529935861796
MCC on Blind test: 0.11
Accuracy on Blind test: 0.32
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16183567 0.13382697 0.17709804 0.17817044 0.19914985 0.18385673
0.18355298 0.17946386 0.18800902 0.17873001]
mean value: 0.1763693571090698
key: score_time
value: [0.02537632 0.01524782 0.02521944 0.02523208 0.02549791 0.0310111
0.0255177 0.02803326 0.02813077 0.02520919]
mean value: 0.025447559356689454
key: test_mcc
value: [0.8951918 0.50927421 0.65018988 0.57973205 0.46697379 0.71428571
0.61065803 0.61706091 0.60753044 0.71611487]
mean value: 0.6367011683025293
key: train_mcc
value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98825791 0.98437404
0.98437404 0.98038334 0.98428248 0.98437404]
mean value: 0.9851202321407464
key: test_accuracy
value: [0.94736842 0.75438596 0.8245614 0.78947368 0.73214286 0.85714286
0.80357143 0.80357143 0.80357143 0.85714286]
mean value: 0.8172932330827067
key: train_accuracy
value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99409449 0.99212598
0.99212598 0.99015748 0.99212598 0.99212598]
mean value: 0.9925142493283015
key: test_fscore
value: [0.94545455 0.74074074 0.83333333 0.8 0.71698113 0.85714286
0.79245283 0.78431373 0.80701754 0.86206897]
mean value: 0.8139505673802714
key: train_fscore
value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99405941 0.99206349
0.99206349 0.99009901 0.99209486 0.99206349]
mean value: 0.9924634309494456
key: test_precision
value: [0.96296296 0.76923077 0.80645161 0.77419355 0.76 0.85714286
0.84 0.86956522 0.79310345 0.83333333]
mean value: 0.8265983749627411
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.99601594 0.99603175 1. ]
mean value: 0.9992047682286727
key: test_recall
value: [0.92857143 0.71428571 0.86206897 0.82758621 0.67857143 0.85714286
0.75 0.71428571 0.82142857 0.89285714]
mean value: 0.804679802955665
key: train_recall
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197
0.98425197 0.98425197 0.98818898 0.98425197]
mean value: 0.985815878746382
key: test_roc_auc
value: [0.94704433 0.75369458 0.82389163 0.7887931 0.73214286 0.85714286
0.80357143 0.80357143 0.80357143 0.85714286]
mean value: 0.8170566502463055
key: train_roc_auc
value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99409449 0.99212598
0.99212598 0.99015748 0.99212598 0.99212598]
mean value: 0.9925142385857895
key: test_jcc
value: [0.89655172 0.58823529 0.71428571 0.66666667 0.55882353 0.75
0.65625 0.64516129 0.67647059 0.75757576]
mean value: 0.6910020564753356
key: train_jcc
value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197
0.98425197 0.98039216 0.98431373 0.98425197]
mean value: 0.9850423724934871
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.7932086 0.77255511 0.77291465 0.76873159 0.76692247 0.77482271
0.77445388 0.78742743 0.77633667 0.7769289 ]
mean value: 0.7764302015304565
key: score_time
value: [0.01081681 0.00929284 0.00945687 0.00935292 0.0094583 0.00916243
0.01019382 0.00930643 0.0095222 0.00936031]
mean value: 0.009592294692993164
key: test_mcc
value: [0.93202124 0.92980296 0.92980296 0.93202124 0.89342711 1.
0.93094934 0.89342711 0.89802651 0.92857143]
mean value: 0.9268049893869115
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.96491228 0.94642857 1.
0.96428571 0.94642857 0.94642857 0.96428571]
mean value: 0.9627506265664161
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.96428571 0.96551724 0.96666667 0.94736842 1.
0.96551724 0.94736842 0.94915254 0.96428571]
mean value: 0.9633124925437824
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.93548387 0.93103448 1.
0.93333333 0.93103448 0.90322581 0.96428571]
mean value: 0.9528200646220668
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.96428571 0.96551724 1. 0.96428571 1.
1. 0.96428571 1. 0.96428571]
mean value: 0.9751231527093596
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.96490148 0.96490148 0.96428571 0.94642857 1.
0.96428571 0.94642857 0.94642857 0.96428571]
mean value: 0.9626231527093597
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.93103448 0.93333333 0.93548387 0.9 1.
0.93333333 0.9 0.90322581 0.93103448]
mean value: 0.9296016738174692
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.29
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03165007 0.03190446 0.03232837 0.0449326 0.05335832 0.05189848
0.0477066 0.03186512 0.04317474 0.0371747 ]
mean value: 0.04059934616088867
key: score_time
value: [0.01240468 0.01262307 0.0133605 0.0146513 0.02085543 0.01273656
0.01313639 0.01332498 0.01820731 0.01342344]
mean value: 0.014472365379333496
key: test_mcc
value: [ 0.10833157 0.24917763 0.06746787 0.23089176 0.09325048 0.10206207
0. -0.06262243 0.26111648 0. ]
mean value: 0.10496754415294487
key: train_mcc
value: [0.35660528 0.46863058 0.29603453 0.48234536 0.53181602 0.36249568
0.37284508 0.40307741 0.69981145 0.36941213]
mean value: 0.43430735340982196
key: test_accuracy
value: [0.52631579 0.59649123 0.52631579 0.59649123 0.53571429 0.53571429
0.5 0.48214286 0.60714286 0.5 ]
mean value: 0.5406328320802005
key: train_accuracy
value: [0.61341223 0.68047337 0.57988166 0.68836292 0.72047244 0.61614173
0.62204724 0.63976378 0.82874016 0.62007874]
mean value: 0.6609374272002981
key: test_fscore
value: [0.65822785 0.68493151 0.66666667 0.69333333 0.64864865 0.65789474
0.65 0.63291139 0.69444444 0.64102564]
mean value: 0.6628084218316483
key: train_fscore
value: [0.72159091 0.75820896 0.70375522 0.76204819 0.78153846 0.72261735
0.72571429 0.73516643 0.85378151 0.72467903]
mean value: 0.7489100342144692
key: test_precision
value: [0.50980392 0.55555556 0.51923077 0.56521739 0.52173913 0.52083333
0.5 0.49019608 0.56818182 0.5 ]
mean value: 0.5250757998040607
key: train_precision
value: [0.56444444 0.61057692 0.54291845 0.61557178 0.64141414 0.56570156
0.56950673 0.5812357 0.74486804 0.56823266]
mean value: 0.6004470420827805
key: test_recall
value: [0.92857143 0.89285714 0.93103448 0.89655172 0.85714286 0.89285714
0.92857143 0.89285714 0.89285714 0.89285714]
mean value: 0.9006157635467981
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.53325123 0.60160099 0.51908867 0.591133 0.53571429 0.53571429
0.5 0.48214286 0.60714286 0.5 ]
mean value: 0.5405788177339901
key: train_roc_auc
value: [0.61264822 0.6798419 0.58070866 0.68897638 0.72047244 0.61614173
0.62204724 0.63976378 0.82874016 0.62007874]
mean value: 0.6609419252435343
key: test_jcc
value: [0.49056604 0.52083333 0.5 0.53061224 0.48 0.49019608
0.48148148 0.46296296 0.53191489 0.47169811]
mean value: 0.49602651456675273
key: train_jcc
value: [0.56444444 0.61057692 0.54291845 0.61557178 0.64141414 0.56570156
0.56950673 0.5812357 0.74486804 0.56823266]
mean value: 0.6004470420827805
MCC on Blind test: -0.06
Accuracy on Blind test: 0.18
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02423191 0.03349376 0.03827691 0.03822851 0.04923773 0.03836203
0.03804398 0.03832388 0.03832483 0.03844643]
mean value: 0.03749699592590332
key: score_time
value: [0.0185101 0.01860046 0.01860714 0.01879597 0.01897502 0.01841974
0.01845193 0.01842713 0.01836824 0.01837802]
mean value: 0.01855337619781494
key: test_mcc
value: [0.82942474 0.8953202 0.78940887 0.75808552 0.85714286 0.89802651
0.71611487 0.93094934 0.68250015 0.75047877]
mean value: 0.8107451823738316
key: train_mcc
value: [0.86590205 0.87777017 0.86194018 0.87387949 0.87062545 0.86638349
0.86638349 0.86638349 0.88616336 0.86624915]
mean value: 0.8701680316316706
key: test_accuracy
value: [0.9122807 0.94736842 0.89473684 0.87719298 0.92857143 0.94642857
0.85714286 0.96428571 0.83928571 0.875 ]
mean value: 0.9042293233082707
key: train_accuracy
value: [0.93293886 0.93885602 0.93096647 0.93688363 0.93503937 0.93307087
0.93307087 0.93307087 0.94291339 0.93307087]
mean value: 0.9349881190886642
key: test_fscore
value: [0.91525424 0.94736842 0.89655172 0.8852459 0.92857143 0.94339623
0.85185185 0.96296296 0.84745763 0.87272727]
mean value: 0.9051387653765297
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:195: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:198: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93333333 0.93933464 0.93096647 0.9372549 0.93617021 0.93385214
0.93385214 0.93385214 0.94368932 0.93359375]
mean value: 0.935589904607467
key: test_precision
value: [0.87096774 0.93103448 0.89655172 0.84375 0.92857143 1.
0.88461538 1. 0.80645161 0.88888889]
mean value: 0.9050831263810963
key: train_precision
value: [0.9296875 0.93385214 0.92913386 0.92996109 0.92015209 0.92307692
0.92307692 0.92307692 0.93103448 0.92635659]
mean value: 0.926940852023113
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714
0.82142857 0.92857143 0.89285714 0.85714286]
mean value: 0.9077586206896552
key: train_recall
value: [0.93700787 0.94488189 0.93280632 0.94466403 0.95275591 0.94488189
0.94488189 0.94488189 0.95669291 0.94094488]
mean value: 0.9444399489589493
key: test_roc_auc
value: [0.91317734 0.9476601 0.89470443 0.87623153 0.92857143 0.94642857
0.85714286 0.96428571 0.83928571 0.875 ]
mean value: 0.9042487684729065
key: train_roc_auc
value: [0.93293081 0.93884411 0.93097009 0.93689894 0.93503937 0.93307087
0.93307087 0.93307087 0.94291339 0.93307087]
mean value: 0.9349880178021226
key: test_jcc
value: [0.84375 0.9 0.8125 0.79411765 0.86666667 0.89285714
0.74193548 0.92857143 0.73529412 0.77419355]
mean value: 0.8289886035059185
key: train_jcc
value: [0.875 0.88560886 0.87084871 0.88191882 0.88 0.87591241
0.87591241 0.87591241 0.89338235 0.87545788]
mean value: 0.8789953838440262
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27105737 0.27678013 0.37362313 0.41940355 0.39782715 0.30619216
0.28300643 0.27874351 0.36507368 0.30322099]
mean value: 0.3274928092956543
key: score_time
value: [0.0186367 0.01867247 0.01899481 0.0188005 0.01891422 0.01871634
0.01878166 0.01856637 0.01868105 0.01864028]
mean value: 0.01874043941497803
key: test_mcc
value: [0.82942474 0.8953202 0.78940887 0.79110556 0.85714286 0.89802651
0.78571429 0.93094934 0.68250015 0.75047877]
mean value: 0.821007127710734
key: train_mcc
value: [0.86590205 0.87777017 0.86194018 0.90927623 0.90576456 0.88987413
0.88588856 0.86638349 0.88616336 0.86624915]
mean value: 0.8815211891146428
key: test_accuracy
value: [0.9122807 0.94736842 0.89473684 0.89473684 0.92857143 0.94642857
0.89285714 0.96428571 0.83928571 0.875 ]
mean value: 0.9095551378446115
key: train_accuracy
value: [0.93293886 0.93885602 0.93096647 0.95463511 0.95275591 0.94488189
0.94291339 0.93307087 0.94291339 0.93307087]
mean value: 0.9407002748916741
key: test_fscore
value: [0.91525424 0.94736842 0.89655172 0.9 0.92857143 0.94339623
0.89285714 0.96296296 0.84745763 0.87272727]
mean value: 0.9107147043131244
key: train_fscore
value: [0.93333333 0.93933464 0.93096647 0.95445545 0.95330739 0.9453125
0.94324853 0.93385214 0.94368932 0.93359375]
mean value: 0.9411093522022579
key: test_precision
value: [0.87096774 0.93103448 0.89655172 0.87096774 0.92857143 1.
0.89285714 1. 0.80645161 0.88888889]
mean value: 0.9086290763988205
key: train_precision
value: [0.9296875 0.93385214 0.92913386 0.95634921 0.94230769 0.9379845
0.93774319 0.92307692 0.93103448 0.92635659]
mean value: 0.9347526078770776
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714
0.89285714 0.92857143 0.89285714 0.85714286]
mean value: 0.9149014778325123
key: train_recall
value: [0.93700787 0.94488189 0.93280632 0.95256917 0.96456693 0.95275591
0.9488189 0.94488189 0.95669291 0.94094488]
mean value: 0.9475926675173508
key: test_roc_auc
value: [0.91317734 0.9476601 0.89470443 0.89408867 0.92857143 0.94642857
0.89285714 0.96428571 0.83928571 0.875 ]
mean value: 0.9096059113300493
key: train_roc_auc
value: [0.93293081 0.93884411 0.93097009 0.95463104 0.95275591 0.94488189
0.94291339 0.93307087 0.94291339 0.93307087]
mean value: 0.9406982353490398
key: test_jcc
value: [0.84375 0.9 0.8125 0.81818182 0.86666667 0.89285714
0.80645161 0.92857143 0.73529412 0.77419355]
mean value: 0.8378466335214438
key: train_jcc
value: [0.875 0.88560886 0.87084871 0.91287879 0.91078067 0.8962963
0.89259259 0.87591241 0.89338235 0.87545788]
mean value: 0.888875854764648
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7