LSHTM_analysis/scripts/ml/log_pnca_cd_7030.txt
2022-06-20 21:55:47 +01:00

19358 lines
943 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_7030.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 424
PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation
or_mychisq 102
log10_or_mychisq 102
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 166
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 173
-------------------------------------------------------------
Successfully split data with stratification [COMPLETE data]: 70/30
Original data size: (424, 173)
Train data size: (284, 173)
Test data size: (140, 173)
y_train numbers: Counter({1: 156, 0: 128})
y_train ratio: 0.8205128205128205
y_test_numbers: Counter({1: 77, 0: 63})
y_test ratio: 0.8181818181818182
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
Original Data
Counter({1: 156, 0: 128}) Data dim: (284, 173)
Simple Random OverSampling
Counter({1: 156, 0: 156})
(312, 173)
Simple Random UnderSampling
Counter({0: 128, 1: 128})
(256, 173)
Simple Combined Over and UnderSampling
Counter({0: 156, 1: 156})
(312, 173)
SMOTE_NC OverSampling
Counter({1: 156, 0: 156})
(312, 173)
#####################################################################
Running ML analysis [COMPLETE DATA]: 70/30 split
Gene name: pncA
Drug name: pyrazinamide
Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_cd_7030/
Sanity checks:
Total input features: 173
Training data size: (284, 173)
Test data size: (140, 173)
Target feature numbers (training data): Counter({1: 156, 0: 128})
Target features ratio (training data: 0.8205128205128205
Target feature numbers (test data): Counter({1: 77, 0: 63})
Target features ratio (test data): 0.8181818181818182
#####################################################################
================================================================
Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03416944 0.03253078 0.03403378 0.03418827 0.03402686 0.03386068
0.03349853 0.03472042 0.03349781 0.03371382]
mean value: 0.033824038505554196
key: score_time
value: [0.01223898 0.01185846 0.0140183 0.01403618 0.01377988 0.01398516
0.01398301 0.01379943 0.01409292 0.01396394]
mean value: 0.0135756254196167
key: test_mcc
value: [0.43855669 0.51675233 0.43855669 0.6505161 0.57054433 0.17660431
0.20672456 0.85641026 0.51681139 0.12595415]
mean value: 0.4497430811646085
key: train_mcc
value: [0.75414242 0.72243454 0.73031782 0.69053483 0.7079253 0.75586888
0.71526337 0.71525557 0.71543078 0.71526337]
mean value: 0.722243686789569
key: test_accuracy
value: [0.72413793 0.75862069 0.72413793 0.82758621 0.78571429 0.60714286
0.60714286 0.92857143 0.75 0.57142857]
mean value: 0.728448275862069
key: train_accuracy
value: [0.87843137 0.8627451 0.86666667 0.84705882 0.85546875 0.87890625
0.859375 0.859375 0.859375 0.859375 ]
mean value: 0.8626776960784314
key: test_fscore
value: [0.76470588 0.77419355 0.76470588 0.84848485 0.83333333 0.68571429
0.64516129 0.93333333 0.74074074 0.64705882]
mean value: 0.7637431968551514
key: train_fscore
value: [0.89122807 0.87804878 0.88111888 0.8641115 0.86925795 0.88888889
0.87412587 0.875 0.87586207 0.87412587]
mean value: 0.8771767886676154
key: test_precision
value: [0.72222222 0.8 0.72222222 0.82352941 0.75 0.63157895
0.625 0.93333333 0.83333333 0.57894737]
mean value: 0.7420166838665291
key: train_precision
value: [0.87586207 0.85714286 0.8630137 0.84353741 0.86013986 0.89208633
0.86206897 0.85714286 0.85234899 0.86206897]
mean value: 0.862541201224554
key: test_recall
value: [0.8125 0.75 0.8125 0.875 0.9375 0.75
0.66666667 0.93333333 0.66666667 0.73333333]
mean value: 0.79375
key: train_recall
value: [0.90714286 0.9 0.9 0.88571429 0.87857143 0.88571429
0.88652482 0.89361702 0.90070922 0.88652482]
mean value: 0.892451874366768
key: test_roc_auc
value: [0.71394231 0.75961538 0.71394231 0.82211538 0.76041667 0.58333333
0.6025641 0.92820513 0.75641026 0.55897436]
mean value: 0.719951923076923
key: train_roc_auc
value: [0.87531056 0.85869565 0.86304348 0.84285714 0.85307882 0.87820197
0.85630589 0.85550416 0.85470244 0.85630589]
mean value: 0.8594005998520496
key: test_jcc
value: [0.61904762 0.63157895 0.61904762 0.73684211 0.71428571 0.52173913
0.47619048 0.875 0.58823529 0.47826087]
mean value: 0.6260227775320655
key: train_jcc
value: [0.80379747 0.7826087 0.7875 0.7607362 0.76875 0.8
0.77639752 0.77777778 0.7791411 0.77639752]
mean value: 0.781310627345378
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.8561759 0.92165303 0.76126075 0.90251565 0.75767636 0.78732276
0.94697428 0.77597547 0.77366757 0.85835242]
mean value: 0.8341574192047119
key: score_time
value: [0.0150044 0.01200652 0.0120039 0.01198578 0.01203895 0.01211858
0.01436782 0.01202631 0.01211882 0.01197553]
mean value: 0.012564659118652344
key: test_mcc
value: [0.37799476 0.51675233 0.51308782 0.65896573 0.57054433 0.25819889
0.13091876 0.85641026 0.51681139 0.13091876]
mean value: 0.45306030307351824
key: train_mcc
value: [0.95253043 0.67476612 0.68252542 0.6267925 0.6525002 0.69995476
0.79449635 0.65192757 0.77878421 0.66766491]
mean value: 0.7181942464530867
key: test_accuracy
value: [0.68965517 0.75862069 0.75862069 0.82758621 0.78571429 0.64285714
0.57142857 0.92857143 0.75 0.57142857]
mean value: 0.728448275862069
key: train_accuracy
value: [0.97647059 0.83921569 0.84313725 0.81568627 0.828125 0.8515625
0.8984375 0.828125 0.890625 0.8359375 ]
mean value: 0.8607322303921568
key: test_fscore
value: [0.70967742 0.77419355 0.8 0.85714286 0.83333333 0.70588235
0.625 0.93333333 0.74074074 0.625 ]
mean value: 0.7604303585233376
key: train_fscore
value: [0.9787234 0.85813149 0.86013986 0.83737024 0.84507042 0.86619718
0.90909091 0.84931507 0.90277778 0.85517241]
mean value: 0.876198876928773
key: test_precision
value: [0.73333333 0.8 0.73684211 0.78947368 0.75 0.66666667
0.58823529 0.93333333 0.83333333 0.58823529]
mean value: 0.7419453044375645
key: train_precision
value: [0.97183099 0.83221477 0.84246575 0.81208054 0.83333333 0.85416667
0.89655172 0.82119205 0.88435374 0.83221477]
mean value: 0.8580404325068907
key: test_recall
value: [0.6875 0.75 0.875 0.9375 0.9375 0.75
0.66666667 0.93333333 0.66666667 0.66666667]
mean value: 0.7870833333333334
key: train_recall
value: [0.98571429 0.88571429 0.87857143 0.86428571 0.85714286 0.87857143
0.92198582 0.87943262 0.92198582 0.87943262]
mean value: 0.8952836879432624
key: test_roc_auc
value: [0.68990385 0.75961538 0.74519231 0.81490385 0.76041667 0.625
0.56410256 0.92820513 0.75641026 0.56410256]
mean value: 0.7207852564102564
key: train_roc_auc
value: [0.97546584 0.83416149 0.83928571 0.81040373 0.82512315 0.84876847
0.89577552 0.82232501 0.88707986 0.83102066]
mean value: 0.8569409444214063
key: test_jcc
value: [0.55 0.63157895 0.66666667 0.75 0.71428571 0.54545455
0.45454545 0.875 0.58823529 0.45454545]
mean value: 0.6230312076983904
key: train_jcc
value: [0.95833333 0.75151515 0.75460123 0.7202381 0.73170732 0.76397516
0.83333333 0.73809524 0.82278481 0.74698795]
mean value: 0.7821571612795502
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01298046 0.0125916 0.00941372 0.00909805 0.01014328 0.00974369
0.01018119 0.00992966 0.01014757 0.00922632]
mean value: 0.010345554351806641
key: score_time
value: [0.01201987 0.00916719 0.00919127 0.00949001 0.00955749 0.00926924
0.00949407 0.00916719 0.00916576 0.00878835]
mean value: 0.009531044960021972
key: test_mcc
value: [0.58145719 0.1527557 0.2956562 0.36894943 0.66666667 0.55943093
0.03739788 0.78555332 0.35228194 0.35228194]
mean value: 0.415243118729667
key: train_mcc
value: [0.49498371 0.48735671 0.47578012 0.45997473 0.47938449 0.51209302
0.54826324 0.51261282 0.47777839 0.5254541 ]
mean value: 0.49736813279152425
key: test_accuracy
value: [0.79310345 0.5862069 0.65517241 0.68965517 0.82142857 0.78571429
0.53571429 0.89285714 0.67857143 0.67857143]
mean value: 0.7116995073891625
key: train_accuracy
value: [0.74901961 0.74117647 0.74117647 0.73333333 0.7421875 0.7578125
0.77734375 0.7578125 0.7421875 0.765625 ]
mean value: 0.7507674632352941
key: test_fscore
value: [0.82352941 0.64705882 0.70588235 0.72727273 0.86486486 0.82352941
0.64864865 0.90322581 0.72727273 0.72727273]
mean value: 0.7598557501783308
key: train_fscore
value: [0.76811594 0.79375 0.78145695 0.77631579 0.78289474 0.79605263
0.80677966 0.8 0.78431373 0.8013245 ]
mean value: 0.789100394338451
key: test_precision
value: [0.77777778 0.61111111 0.66666667 0.70588235 0.76190476 0.77777778
0.54545455 0.875 0.66666667 0.66666667]
mean value: 0.7054908326967151
key: train_precision
value: [0.77941176 0.70555556 0.72839506 0.7195122 0.72560976 0.73780488
0.77272727 0.73372781 0.72727273 0.7515528 ]
mean value: 0.7381569816940069
key: test_recall
value: [0.875 0.6875 0.75 0.75 1. 0.875
0.8 0.93333333 0.8 0.8 ]
mean value: 0.8270833333333334
key: train_recall
value: [0.75714286 0.90714286 0.84285714 0.84285714 0.85 0.86428571
0.84397163 0.87943262 0.85106383 0.85815603]
mean value: 0.8496909827760891
key: test_roc_auc
value: [0.78365385 0.57451923 0.64423077 0.68269231 0.79166667 0.77083333
0.51538462 0.88974359 0.66923077 0.66923077]
mean value: 0.6991185897435898
key: train_roc_auc
value: [0.74813665 0.72313665 0.73012422 0.72142857 0.73103448 0.74679803
0.7698119 0.74406414 0.72987974 0.75516497]
mean value: 0.7399579351661555
key: test_jcc
value: [0.7 0.47826087 0.54545455 0.57142857 0.76190476 0.7
0.48 0.82352941 0.57142857 0.57142857]
mean value: 0.6203435302974944
key: train_jcc
value: [0.62352941 0.65803109 0.64130435 0.6344086 0.64324324 0.66120219
0.67613636 0.66666667 0.64516129 0.66850829]
mean value: 0.6518191486778253
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0094049 0.0092361 0.00938892 0.0094986 0.01036501 0.01064968
0.00943542 0.00975943 0.01038313 0.00947523]
mean value: 0.009759640693664551
key: score_time
value: [0.00875306 0.00907779 0.00919843 0.00939655 0.00899863 0.00959301
0.00873184 0.00939775 0.00951385 0.00956416]
mean value: 0.00922250747680664
key: test_mcc
value: [0.36720991 0.51675233 0.29458249 0.51308782 0.55943093 0.17660431
0.20672456 0.64084613 0.42564103 0.27928963]
mean value: 0.3980169135730911
key: train_mcc
value: [0.52291851 0.49871807 0.53884849 0.47460238 0.52523163 0.53332692
0.56453014 0.49226967 0.540165 0.55613108]
mean value: 0.524674189335105
key: test_accuracy
value: [0.68965517 0.75862069 0.65517241 0.75862069 0.78571429 0.60714286
0.60714286 0.82142857 0.71428571 0.64285714]
mean value: 0.704064039408867
key: train_accuracy
value: [0.76470588 0.75294118 0.77254902 0.74117647 0.765625 0.76953125
0.78515625 0.75 0.7734375 0.78125 ]
mean value: 0.7656372549019608
key: test_fscore
value: [0.74285714 0.77419355 0.72222222 0.8 0.82352941 0.68571429
0.64516129 0.83870968 0.73333333 0.70588235]
mean value: 0.7471603264961899
key: train_fscore
value: [0.79591837 0.78350515 0.80136986 0.7739726 0.79310345 0.79863481
0.80836237 0.7852349 0.80136986 0.80821918]
mean value: 0.7949690558064818
key: test_precision
value: [0.68421053 0.8 0.65 0.73684211 0.77777778 0.63157895
0.625 0.8125 0.73333333 0.63157895]
mean value: 0.70828216374269
key: train_precision
value: [0.75974026 0.75496689 0.76973684 0.74342105 0.76666667 0.76470588
0.79452055 0.74522293 0.77483444 0.78145695]
mean value: 0.7655272459523916
key: test_recall
value: [0.8125 0.75 0.8125 0.875 0.875 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7941666666666667
key: train_recall
value: [0.83571429 0.81428571 0.83571429 0.80714286 0.82142857 0.83571429
0.82269504 0.82978723 0.82978723 0.83687943]
mean value: 0.8269148936170213
key: test_roc_auc
value: [0.67548077 0.75961538 0.63701923 0.74519231 0.77083333 0.58333333
0.6025641 0.81794872 0.71282051 0.63076923]
mean value: 0.6935576923076923
key: train_roc_auc
value: [0.75698758 0.74627329 0.76568323 0.73400621 0.75985222 0.76268473
0.78091274 0.74098057 0.76706753 0.77496146]
mean value: 0.7589409550543877
key: test_jcc
value: [0.59090909 0.63157895 0.56521739 0.66666667 0.7 0.52173913
0.47619048 0.72222222 0.57894737 0.54545455]
mean value: 0.5998925838971605
key: train_jcc
value: [0.66101695 0.6440678 0.66857143 0.63128492 0.65714286 0.66477273
0.67836257 0.64640884 0.66857143 0.67816092]
mean value: 0.6598360435940921
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00950718 0.01232815 0.01000333 0.00981069 0.00915599 0.00976539
0.01029205 0.00983381 0.00999928 0.00985742]
mean value: 0.010055327415466308
key: score_time
value: [0.04921603 0.02551436 0.01085448 0.01097107 0.01046705 0.01105928
0.01091576 0.01097822 0.01087236 0.01105428]
mean value: 0.016190290451049805
key: test_mcc
value: [ 0.29458249 -0.11538462 0.02403846 0.29458249 0.27083333 0.25819889
0.05337605 0.65118783 -0.01571025 0.13091876]
mean value: 0.18466234484517727
key: train_mcc
value: [0.49947602 0.52489912 0.54682896 0.51496625 0.54964568 0.57461786
0.57748217 0.532427 0.57422132 0.55021867]
mean value: 0.5444783048862579
key: test_accuracy
value: [0.65517241 0.44827586 0.51724138 0.65517241 0.64285714 0.64285714
0.53571429 0.82142857 0.5 0.57142857]
mean value: 0.5990147783251232
key: train_accuracy
value: [0.75294118 0.76470588 0.77647059 0.76078431 0.77734375 0.7890625
0.7890625 0.76953125 0.7890625 0.77734375]
mean value: 0.7746308210784314
key: test_fscore
value: [0.72222222 0.5 0.5625 0.72222222 0.6875 0.70588235
0.60606061 0.84848485 0.5625 0.625 ]
mean value: 0.6542372251931076
key: train_fscore
value: [0.78929766 0.8013245 0.80412371 0.79322034 0.80677966 0.81879195
0.82467532 0.8013468 0.82119205 0.81188119]
mean value: 0.8072633186944136
key: test_precision
value: [0.65 0.5 0.5625 0.65 0.6875 0.66666667
0.55555556 0.77777778 0.52941176 0.58823529]
mean value: 0.616764705882353
key: train_precision
value: [0.74213836 0.74691358 0.77483444 0.75483871 0.76774194 0.7721519
0.76047904 0.76282051 0.77018634 0.75925926]
mean value: 0.7611364075408015
key: test_recall
value: [0.8125 0.5 0.5625 0.8125 0.6875 0.75
0.66666667 0.93333333 0.6 0.66666667]
mean value: 0.6991666666666667
key: train_recall
value: [0.84285714 0.86428571 0.83571429 0.83571429 0.85 0.87142857
0.90070922 0.84397163 0.87943262 0.87234043]
mean value: 0.859645390070922
key: test_roc_auc
value: [0.63701923 0.44230769 0.51201923 0.63701923 0.63541667 0.625
0.52564103 0.81282051 0.49230769 0.56410256]
mean value: 0.5883653846153846
key: train_roc_auc
value: [0.7431677 0.75388199 0.77003106 0.75263975 0.76982759 0.78054187
0.77644157 0.76111625 0.77884675 0.766605 ]
mean value: 0.7653099514072751
key: test_jcc
value: [0.56521739 0.33333333 0.39130435 0.56521739 0.52380952 0.54545455
0.43478261 0.73684211 0.39130435 0.45454545]
mean value: 0.49418110493625367
key: train_jcc
value: [0.6519337 0.66850829 0.67241379 0.65730337 0.67613636 0.69318182
0.70165746 0.66853933 0.69662921 0.68333333]
mean value: 0.6769636665881136
MCC on Blind test: 0.16
Accuracy on Blind test: 0.59
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01607466 0.01531768 0.01556444 0.01464987 0.01540494 0.01492763
0.01399732 0.01559043 0.01450849 0.01498008]
mean value: 0.01510155200958252
key: score_time
value: [0.0113318 0.01088405 0.01098084 0.01086926 0.01097679 0.01084137
0.01012707 0.01082325 0.01090598 0.01094651]
mean value: 0.010868692398071289
key: test_mcc
value: [0.51308782 0.44230769 0.4444578 0.46375229 0.41079192 0.17660431
0.35143175 0.78555332 0.52084744 0.20380987]
mean value: 0.4312644198027845
key: train_mcc
value: [0.66873453 0.685946 0.71495683 0.67730051 0.68762435 0.77251235
0.66054755 0.68495043 0.6910643 0.68385974]
mean value: 0.6927496604406052
key: test_accuracy
value: [0.75862069 0.72413793 0.72413793 0.72413793 0.71428571 0.60714286
0.67857143 0.89285714 0.75 0.60714286]
mean value: 0.718103448275862
key: train_accuracy
value: [0.83529412 0.84313725 0.85882353 0.83921569 0.84375 0.88671875
0.83203125 0.84375 0.84375 0.84375 ]
mean value: 0.8470220588235294
key: test_fscore
value: [0.8 0.75 0.77777778 0.78947368 0.77777778 0.68571429
0.70967742 0.90322581 0.8 0.68571429]
mean value: 0.7679361037001105
key: train_fscore
value: [0.85810811 0.86577181 0.87586207 0.86195286 0.86577181 0.90034364
0.85423729 0.86486486 0.86928105 0.8630137 ]
mean value: 0.8679207203181474
key: test_precision
value: [0.73684211 0.75 0.7 0.68181818 0.7 0.63157895
0.6875 0.875 0.7 0.6 ]
mean value: 0.7062739234449761
key: train_precision
value: [0.81410256 0.8164557 0.84666667 0.81528662 0.8164557 0.86754967
0.81818182 0.82580645 0.80606061 0.83443709]
mean value: 0.8261002878200331
key: test_recall
value: [0.875 0.75 0.875 0.9375 0.875 0.75
0.73333333 0.93333333 0.93333333 0.8 ]
mean value: 0.84625
key: train_recall
value: [0.90714286 0.92142857 0.90714286 0.91428571 0.92142857 0.93571429
0.89361702 0.90780142 0.94326241 0.89361702]
mean value: 0.9145440729483283
key: test_roc_auc
value: [0.74519231 0.72115385 0.70673077 0.69951923 0.6875 0.58333333
0.67435897 0.88974359 0.73589744 0.59230769]
mean value: 0.7035737179487179
key: train_roc_auc
value: [0.82748447 0.83462733 0.85357143 0.8310559 0.83571429 0.88165025
0.82506938 0.8365094 0.83250077 0.83811286]
mean value: 0.839629607688557
key: test_jcc
value: [0.66666667 0.6 0.63636364 0.65217391 0.63636364 0.52173913
0.55 0.82352941 0.66666667 0.52173913]
mean value: 0.6275242191738355
key: train_jcc
value: [0.75147929 0.76331361 0.7791411 0.75739645 0.76331361 0.81875
0.74556213 0.76190476 0.76878613 0.75903614]
mean value: 0.7668683226702581
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.21098614 1.15482354 1.30495906 1.1236217 1.30449224 1.08739567
1.25559568 1.10417318 1.2485714 1.14837241]
mean value: 1.1942991018295288
key: score_time
value: [0.01472187 0.01449966 0.01498246 0.01457214 0.01606774 0.01229334
0.01477671 0.01566529 0.01566625 0.01275349]
mean value: 0.014599895477294922
key: test_mcc
value: [0.44230769 0.44230769 0.37799476 0.58145719 0.41666667 0.33776026
0.29230769 0.71743483 0.51681139 0.27928963]
mean value: 0.44043378142183065
key: train_mcc
value: [0.97625444 0.96839557 0.98416149 0.98426071 0.97636757 0.97636757
0.97632969 0.98430987 0.96880891 0.97653632]
mean value: 0.977179212771556
key: test_accuracy
value: [0.72413793 0.72413793 0.68965517 0.79310345 0.71428571 0.67857143
0.64285714 0.85714286 0.75 0.64285714]
mean value: 0.7216748768472907
key: train_accuracy
value: [0.98823529 0.98431373 0.99215686 0.99215686 0.98828125 0.98828125
0.98828125 0.9921875 0.984375 0.98828125]
mean value: 0.9886550245098039
key: test_fscore
value: [0.75 0.75 0.70967742 0.82352941 0.75 0.72727273
0.64285714 0.875 0.74074074 0.70588235]
mean value: 0.7474959794931332
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[0.98932384 0.9858156 0.99285714 0.9929078 0.98932384 0.98932384
0.98939929 0.99295775 0.98601399 0.98947368]
mean value: 0.9897396787351177
key: test_precision
value: [0.75 0.75 0.73333333 0.77777778 0.75 0.70588235
0.69230769 0.82352941 0.83333333 0.63157895]
mean value: 0.744774284882644
key: train_precision
value: [0.9858156 0.97887324 0.99285714 0.98591549 0.9858156 0.9858156
0.98591549 0.98601399 0.97241379 0.97916667]
mean value: 0.9838602622503995
key: test_recall
value: [0.75 0.75 0.6875 0.875 0.75 0.75
0.6 0.93333333 0.66666667 0.8 ]
mean value: 0.75625
key: train_recall
value: [0.99285714 0.99285714 0.99285714 1. 0.99285714 0.99285714
0.9929078 1. 1. 1. ]
mean value: 0.9957193515704155
key: test_roc_auc
value: [0.72115385 0.72115385 0.68990385 0.78365385 0.70833333 0.66666667
0.64615385 0.85128205 0.75641026 0.63076923]
mean value: 0.7175480769230769
key: train_roc_auc
value: [0.98773292 0.98338509 0.99208075 0.99130435 0.98780788 0.98780788
0.98775825 0.99130435 0.9826087 0.98695652]
mean value: 0.9878746682889559
key: test_jcc
value: [0.6 0.6 0.55 0.7 0.6 0.57142857
0.47368421 0.77777778 0.58823529 0.54545455]
mean value: 0.6006580399304857
key: train_jcc
value: [0.97887324 0.97202797 0.9858156 0.98591549 0.97887324 0.97887324
0.97902098 0.98601399 0.97241379 0.97916667]
mean value: 0.9796994210937537
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02705932 0.01950145 0.02408624 0.02397561 0.01918578 0.02874064
0.01892877 0.01916695 0.0216887 0.03065681]
mean value: 0.023299026489257812
key: score_time
value: [0.0120914 0.00884175 0.01273632 0.00924754 0.00980282 0.00902367
0.00927806 0.00869584 0.01418424 0.01258039]
mean value: 0.010648202896118165
key: test_mcc
value: [0.30288462 0.51675233 0.10047962 0.68473679 0.57735027 0.6333005
0.3721042 0.72307692 0.64450339 0.64450339]
mean value: 0.5199692014733809
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.65517241 0.75862069 0.55172414 0.82758621 0.78571429 0.82142857
0.67857143 0.85714286 0.82142857 0.82142857]
mean value: 0.7578817733990147
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6875 0.77419355 0.58064516 0.86486486 0.8 0.84848485
0.66666667 0.85714286 0.82758621 0.82758621]
mean value: 0.773467036062976
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6875 0.8 0.6 0.76190476 0.85714286 0.82352941
0.75 0.92307692 0.85714286 0.85714286]
mean value: 0.7917439668174963
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.75 0.5625 1. 0.75 0.875 0.6 0.8 0.8 0.8 ]
mean value: 0.7625
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.65144231 0.75961538 0.55048077 0.80769231 0.79166667 0.8125
0.68461538 0.86153846 0.82307692 0.82307692]
mean value: 0.7565705128205128
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.52380952 0.63157895 0.40909091 0.76190476 0.66666667 0.73684211
0.5 0.75 0.70588235 0.70588235]
mean value: 0.6391657619985793
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.7
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11053491 0.10580492 0.10976958 0.11027527 0.1128788 0.14129496
0.10483551 0.10519671 0.1059289 0.1059742 ]
mean value: 0.11124937534332276
key: score_time
value: [0.01880836 0.01879287 0.01911926 0.01787305 0.02713537 0.01979661
0.01765728 0.01798964 0.01790142 0.01904774]
mean value: 0.01941215991973877
key: test_mcc
value: [0.30288462 0.45455066 0.29458249 0.43855669 0.41666667 0.10758287
0.21483446 0.71743483 0.27754778 0.35143175]
mean value: 0.3576072822776395
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.65517241 0.72413793 0.65517241 0.72413793 0.71428571 0.57142857
0.60714286 0.85714286 0.64285714 0.67857143]
mean value: 0.6830049261083744
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6875 0.73333333 0.72222222 0.76470588 0.75 0.64705882
0.62068966 0.875 0.6875 0.70967742]
mean value: 0.719768733596516
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6875 0.78571429 0.65 0.72222222 0.75 0.61111111
0.64285714 0.82352941 0.64705882 0.6875 ]
mean value: 0.700749299719888
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.6875 0.8125 0.8125 0.75 0.6875
0.6 0.93333333 0.73333333 0.73333333]
mean value: 0.74375
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.65144231 0.72836538 0.63701923 0.71394231 0.70833333 0.55208333
0.60769231 0.85128205 0.63589744 0.67435897]
mean value: 0.6760416666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.52380952 0.57894737 0.56521739 0.61904762 0.6 0.47826087
0.45 0.77777778 0.52380952 0.55 ]
mean value: 0.5666870073735062
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.25
Accuracy on Blind test: 0.64
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01011157 0.01035905 0.00959778 0.0101676 0.01034927 0.01034713
0.01018667 0.00985789 0.00959349 0.01052213]
mean value: 0.010109257698059083
key: score_time
value: [0.00964332 0.00898647 0.00881195 0.00940514 0.0092485 0.0096724
0.00936341 0.00948358 0.00939012 0.00940371]
mean value: 0.009340858459472657
key: test_mcc
value: [ 0.31579309 0.03827795 0.21932975 0.45455066 -0.10555008 0.27083333
-0.01571025 0.43589744 0.51681139 0.4241768 ]
mean value: 0.2554410075983323
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.65517241 0.51724138 0.62068966 0.72413793 0.46428571 0.64285714
0.5 0.71428571 0.75 0.71428571]
mean value: 0.6302955665024631
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.53333333 0.7027027 0.73333333 0.54545455 0.6875
0.5625 0.71428571 0.74074074 0.75 ]
mean value: 0.6636517036517037
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.57142857 0.61904762 0.78571429 0.52941176 0.6875
0.52941176 0.76923077 0.83333333 0.70588235]
mean value: 0.6745246175393235
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.625 0.5 0.8125 0.6875 0.5625 0.6875
0.6 0.66666667 0.66666667 0.8 ]
mean value: 0.6608333333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.65865385 0.51923077 0.59855769 0.72836538 0.44791667 0.63541667
0.49230769 0.71794872 0.75641026 0.70769231]
mean value: 0.62625
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.36363636 0.54166667 0.57894737 0.375 0.52380952
0.39130435 0.55555556 0.58823529 0.6 ]
mean value: 0.5018155120032897
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.59
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.45265174 1.38820577 1.40463042 1.39919519 1.42965603 1.43548608
1.42969394 1.42850542 1.45230269 1.45057559]
mean value: 1.4270902872085571
key: score_time
value: [0.09057522 0.09434032 0.09046865 0.15521455 0.09838867 0.09792662
0.09741545 0.09816027 0.09900284 0.09744287]
mean value: 0.10189354419708252
key: test_mcc
value: [0.58145719 0.37799476 0.4444578 0.43855669 0.48553038 0.25819889
0.42564103 0.93094934 0.57948718 0.50128041]
mean value: 0.5023553662949708
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79310345 0.68965517 0.72413793 0.72413793 0.75 0.64285714
0.71428571 0.96428571 0.78571429 0.75 ]
mean value: 0.7538177339901478
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.70967742 0.77777778 0.76470588 0.78787879 0.70588235
0.73333333 0.96551724 0.78571429 0.75862069]
mean value: 0.781263718215233
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.73333333 0.7 0.72222222 0.76470588 0.66666667
0.73333333 1. 0.84615385 0.78571429]
mean value: 0.7729907347554406
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.6875 0.875 0.8125 0.8125 0.75
0.73333333 0.93333333 0.73333333 0.73333333]
mean value: 0.7945833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78365385 0.68990385 0.70673077 0.71394231 0.73958333 0.625
0.71282051 0.96666667 0.78974359 0.75128205]
mean value: 0.7479326923076923
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.7 0.55 0.63636364 0.61904762 0.65 0.54545455
0.57894737 0.93333333 0.64705882 0.61111111]
mean value: 0.647131643726071
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.74
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.7486918 0.90479875 0.92124152 0.9011457 0.96960497 0.88509083
0.91266322 0.88183689 0.89421272 0.90750265]
mean value: 0.9926789045333863
key: score_time
value: [0.19227123 0.21386743 0.26786327 0.23550534 0.25344205 0.12532926
0.20549893 0.20171428 0.23566985 0.26774335]
mean value: 0.21989049911499023
key: test_mcc
value: [0.50973276 0.51675233 0.72435769 0.4444578 0.55943093 0.25819889
0.42564103 0.93094934 0.64450339 0.42564103]
mean value: 0.5439665164350693
key: train_mcc
value: [0.88962581 0.88962581 0.88144491 0.88193307 0.90602026 0.889773
0.90584149 0.88995933 0.88180723 0.88155407]
mean value: 0.889758497328895
key: test_accuracy
value: [0.75862069 0.75862069 0.86206897 0.72413793 0.78571429 0.64285714
0.71428571 0.96428571 0.82142857 0.71428571]
mean value: 0.7746305418719212
key: train_accuracy
value: [0.94509804 0.94509804 0.94117647 0.94117647 0.953125 0.9453125
0.953125 0.9453125 0.94140625 0.94140625]
mean value: 0.9452236519607843
key: test_fscore
value: [0.78787879 0.77419355 0.88235294 0.77777778 0.82352941 0.70588235
0.73333333 0.96551724 0.82758621 0.73333333]
mean value: 0.8011384934868544
key: train_fscore
value: [0.95104895 0.95104895 0.94736842 0.94773519 0.95804196 0.95070423
0.95833333 0.95138889 0.94773519 0.94736842]
mean value: 0.951077353309472
key: test_precision
value: [0.76470588 0.8 0.83333333 0.7 0.77777778 0.66666667
0.73333333 1. 0.85714286 0.73333333]
mean value: 0.7866293183940243
key: train_precision
value: [0.93150685 0.93150685 0.93103448 0.92517007 0.93835616 0.9375
0.93877551 0.93197279 0.93150685 0.9375 ]
mean value: 0.9334829562434327
key: test_recall
value: [0.8125 0.75 0.9375 0.875 0.875 0.75
0.73333333 0.93333333 0.8 0.73333333]
mean value: 0.82
key: train_recall
value: [0.97142857 0.97142857 0.96428571 0.97142857 0.97857143 0.96428571
0.9787234 0.97163121 0.96453901 0.95744681]
mean value: 0.9693768996960486
key: test_roc_auc
value: [0.75240385 0.75961538 0.85336538 0.70673077 0.77083333 0.625
0.71282051 0.96666667 0.82307692 0.71282051]
mean value: 0.7683333333333333
key: train_roc_auc
value: [0.94223602 0.94223602 0.9386646 0.9378882 0.95049261 0.94334975
0.95023127 0.94233734 0.93879124 0.93959297]
mean value: 0.9425820030714127
key: test_jcc
value: [0.65 0.63157895 0.78947368 0.63636364 0.7 0.54545455
0.57894737 0.93333333 0.70588235 0.57894737]
mean value: 0.6749981236513745
key: train_jcc
value: [0.90666667 0.90666667 0.9 0.90066225 0.91946309 0.90604027
0.92 0.90728477 0.90066225 0.9 ]
mean value: 0.906744596056121
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01031065 0.01042509 0.00919986 0.01029253 0.00933981 0.00958133
0.01031375 0.0097096 0.00932622 0.00971889]
mean value: 0.009821772575378418
key: score_time
value: [0.00951123 0.01605248 0.00885272 0.00941873 0.00853109 0.00870562
0.00875068 0.00936747 0.00857401 0.00866628]
mean value: 0.009643030166625977
key: test_mcc
value: [0.36720991 0.51675233 0.29458249 0.51308782 0.55943093 0.17660431
0.20672456 0.64084613 0.42564103 0.27928963]
mean value: 0.3980169135730911
key: train_mcc
value: [0.52291851 0.49871807 0.53884849 0.47460238 0.52523163 0.53332692
0.56453014 0.49226967 0.540165 0.55613108]
mean value: 0.524674189335105
key: test_accuracy
value: [0.68965517 0.75862069 0.65517241 0.75862069 0.78571429 0.60714286
0.60714286 0.82142857 0.71428571 0.64285714]
mean value: 0.704064039408867
key: train_accuracy
value: [0.76470588 0.75294118 0.77254902 0.74117647 0.765625 0.76953125
0.78515625 0.75 0.7734375 0.78125 ]
mean value: 0.7656372549019608
key: test_fscore
value: [0.74285714 0.77419355 0.72222222 0.8 0.82352941 0.68571429
0.64516129 0.83870968 0.73333333 0.70588235]
mean value: 0.7471603264961899
key: train_fscore
value: [0.79591837 0.78350515 0.80136986 0.7739726 0.79310345 0.79863481
0.80836237 0.7852349 0.80136986 0.80821918]
mean value: 0.7949690558064818
key: test_precision
value: [0.68421053 0.8 0.65 0.73684211 0.77777778 0.63157895
0.625 0.8125 0.73333333 0.63157895]
mean value: 0.70828216374269
key: train_precision
value: [0.75974026 0.75496689 0.76973684 0.74342105 0.76666667 0.76470588
0.79452055 0.74522293 0.77483444 0.78145695]
mean value: 0.7655272459523916
key: test_recall
value: [0.8125 0.75 0.8125 0.875 0.875 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7941666666666667
key: train_recall
value: [0.83571429 0.81428571 0.83571429 0.80714286 0.82142857 0.83571429
0.82269504 0.82978723 0.82978723 0.83687943]
mean value: 0.8269148936170213
key: test_roc_auc
value: [0.67548077 0.75961538 0.63701923 0.74519231 0.77083333 0.58333333
0.6025641 0.81794872 0.71282051 0.63076923]
mean value: 0.6935576923076923
key: train_roc_auc
value: [0.75698758 0.74627329 0.76568323 0.73400621 0.75985222 0.76268473
0.78091274 0.74098057 0.76706753 0.77496146]
mean value: 0.7589409550543877
key: test_jcc
value: [0.59090909 0.63157895 0.56521739 0.66666667 0.7 0.52173913
0.47619048 0.72222222 0.57894737 0.54545455]
mean value: 0.5998925838971605
key: train_jcc
value: [0.66101695 0.6440678 0.66857143 0.63128492 0.65714286 0.66477273
0.67836257 0.64640884 0.66857143 0.67816092]
mean value: 0.6598360435940921
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.11305761 0.07152939 0.07408118 0.0680635 0.06910729 0.07664919
0.07117009 0.09131765 0.06809759 0.07645154]
mean value: 0.07795250415802002
key: score_time
value: [0.01137042 0.0111258 0.01030493 0.01230121 0.01032948 0.01086855
0.01022887 0.01089406 0.01038265 0.0107193 ]
mean value: 0.010852527618408204
key: test_mcc
value: [0.44230769 0.50973276 0.51675233 0.6505161 0.48553038 0.64019064
0.43589744 0.74885534 0.64450339 0.66151858]
mean value: 0.5735804639693557
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.72413793 0.75862069 0.75862069 0.82758621 0.75 0.82142857
0.71428571 0.85714286 0.82142857 0.82142857]
mean value: 0.7854679802955665
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.78787879 0.77419355 0.84848485 0.78787879 0.85714286
0.71428571 0.84615385 0.82758621 0.81481481]
mean value: 0.8008419411923305
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.76470588 0.8 0.82352941 0.76470588 0.78947368
0.76923077 1. 0.85714286 0.91666667]
mean value: 0.8235455153721407
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.8125 0.75 0.875 0.8125 0.9375
0.66666667 0.73333333 0.8 0.73333333]
mean value: 0.7870833333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.72115385 0.75240385 0.75961538 0.82211538 0.73958333 0.80208333
0.71794872 0.86666667 0.82307692 0.82820513]
mean value: 0.7832852564102564
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.65 0.63157895 0.73684211 0.65 0.75
0.55555556 0.73333333 0.70588235 0.6875 ]
mean value: 0.6700692294461644
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.55
Accuracy on Blind test: 0.78
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04733849 0.04470849 0.05614376 0.0619967 0.06084156 0.06722283
0.06452966 0.06317711 0.07480168 0.04873967]
mean value: 0.05894999504089356
key: score_time
value: [0.01197529 0.01792359 0.02341628 0.01972723 0.01569057 0.01621079
0.02333379 0.02047038 0.02111411 0.01213336]
mean value: 0.01819953918457031
key: test_mcc
value: [0.2956562 0.23923719 0.36720991 0.44230769 0.33113309 0.55943093
0.28205128 0.57080582 0.43589744 0.05337605]
mean value: 0.3577105589280532
key: train_mcc
value: [0.87333821 0.84147447 0.80993789 0.8573396 0.84231844 0.87490348
0.88155407 0.84231285 0.8577884 0.86572261]
mean value: 0.8546690004567509
key: test_accuracy
value: [0.65517241 0.62068966 0.68965517 0.72413793 0.67857143 0.78571429
0.64285714 0.78571429 0.71428571 0.53571429]
mean value: 0.6832512315270935
key: train_accuracy
value: [0.9372549 0.92156863 0.90588235 0.92941176 0.921875 0.9375
0.94140625 0.921875 0.9296875 0.93359375]
mean value: 0.9280055147058823
key: test_fscore
value: [0.70588235 0.64516129 0.74285714 0.75 0.74285714 0.82352941
0.66666667 0.8125 0.71428571 0.60606061]
mean value: 0.7209800327755735
key: train_fscore
value: [0.94366197 0.92907801 0.91428571 0.93617021 0.92957746 0.94444444
0.94736842 0.93055556 0.93661972 0.93992933]
mean value: 0.9351690845840186
key: test_precision
value: [0.66666667 0.66666667 0.68421053 0.75 0.68421053 0.77777778
0.66666667 0.76470588 0.76923077 0.55555556]
mean value: 0.6985691037548623
key: train_precision
value: [0.93055556 0.92253521 0.91428571 0.92957746 0.91666667 0.91891892
0.9375 0.91156463 0.93006993 0.93661972]
mean value: 0.9248293805713322
key: test_recall
value: [0.75 0.625 0.8125 0.75 0.8125 0.875
0.66666667 0.86666667 0.66666667 0.66666667]
mean value: 0.7491666666666666
key: train_recall
value: [0.95714286 0.93571429 0.91428571 0.94285714 0.94285714 0.97142857
0.95744681 0.95035461 0.94326241 0.94326241]
mean value: 0.9458611955420466
key: test_roc_auc
value: [0.64423077 0.62019231 0.67548077 0.72115385 0.65625 0.77083333
0.64102564 0.77948718 0.71794872 0.52564103]
mean value: 0.675224358974359
key: train_roc_auc
value: [0.93509317 0.92003106 0.90496894 0.92795031 0.91970443 0.93399015
0.93959297 0.91865557 0.92815294 0.93250077]
mean value: 0.9260640310543817
key: test_jcc
value: [0.54545455 0.47619048 0.59090909 0.6 0.59090909 0.7
0.5 0.68421053 0.55555556 0.43478261]
mean value: 0.56780118940302
key: train_jcc
value: [0.89333333 0.86754967 0.84210526 0.88 0.86842105 0.89473684
0.9 0.87012987 0.8807947 0.88666667]
mean value: 0.8783737398885534
MCC on Blind test: 0.4
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.04268551 0.00933933 0.00895882 0.00897408 0.00992012 0.00953245
0.00947046 0.00959587 0.00973368 0.01020265]
mean value: 0.012841296195983887
key: score_time
value: [0.00958109 0.00871778 0.00853181 0.00853419 0.00926518 0.00934649
0.00859356 0.00852966 0.00941658 0.00943685]
mean value: 0.008995318412780761
key: test_mcc
value: [ 0.43855669 0.03827795 0.36894943 0.50973276 0.71004695 0.41666667
-0.02738134 0.78555332 0.43262512 0.13091876]
mean value: 0.38039463122020395
key: train_mcc
value: [0.49076747 0.48287166 0.45854085 0.42616368 0.46985293 0.46939037
0.50035102 0.46804282 0.49226675 0.48419112]
mean value: 0.47424386837888466
key: test_accuracy
value: [0.72413793 0.51724138 0.68965517 0.75862069 0.85714286 0.71428571
0.5 0.89285714 0.71428571 0.57142857]
mean value: 0.6939655172413793
key: train_accuracy
value: [0.74901961 0.74509804 0.73333333 0.71764706 0.73828125 0.73828125
0.75390625 0.73828125 0.75 0.74609375]
mean value: 0.7409941789215686
key: test_fscore
value: [0.76470588 0.53333333 0.72727273 0.78787879 0.88235294 0.75
0.58823529 0.90322581 0.76470588 0.625 ]
mean value: 0.7326710654936461
key: train_fscore
value: [0.77931034 0.77508651 0.76712329 0.75675676 0.77591973 0.76975945
0.78350515 0.77441077 0.78082192 0.778157 ]
mean value: 0.774085092050438
key: test_precision
value: [0.72222222 0.57142857 0.70588235 0.76470588 0.83333333 0.75
0.52631579 0.875 0.68421053 0.58823529]
mean value: 0.7021333972185365
key: train_precision
value: [0.75333333 0.75167785 0.73684211 0.71794872 0.72955975 0.74172185
0.76 0.73717949 0.75496689 0.75 ]
mean value: 0.7433229986223217
key: test_recall
value: [0.8125 0.5 0.75 0.8125 0.9375 0.75
0.66666667 0.93333333 0.86666667 0.66666667]
mean value: 0.7695833333333333
key: train_recall
value: [0.80714286 0.8 0.8 0.8 0.82857143 0.8
0.80851064 0.81560284 0.80851064 0.80851064]
mean value: 0.8076849037487336
key: test_roc_auc
value: [0.71394231 0.51923077 0.68269231 0.75240385 0.84375 0.70833333
0.48717949 0.88974359 0.7025641 0.56410256]
mean value: 0.6863942307692308
key: train_roc_auc
value: [0.74270186 0.73913043 0.72608696 0.70869565 0.72894089 0.73189655
0.74773358 0.72954055 0.74338575 0.73903793]
mean value: 0.7337150155925077
key: test_jcc
value: [0.61904762 0.36363636 0.57142857 0.65 0.78947368 0.6
0.41666667 0.82352941 0.61904762 0.45454545]
mean value: 0.5907375390347527
key: train_jcc
value: [0.63841808 0.63276836 0.62222222 0.60869565 0.63387978 0.62569832
0.6440678 0.63186813 0.64044944 0.63687151]
mean value: 0.631493929557765
MCC on Blind test: 0.31
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01471472 0.01537204 0.0144701 0.01906157 0.01802874 0.01486659
0.01644754 0.0162828 0.0146687 0.01574802]
mean value: 0.015966081619262697
key: score_time
value: [0.00864053 0.01096821 0.01096869 0.01156044 0.01153564 0.01147151
0.01165247 0.01153016 0.01147318 0.01150846]
mean value: 0.011130928993225098
key: test_mcc
value: [0.2956562 0.4444578 0.44230769 0.72115385 0.4330127 0.27083333
0.27174649 0.85641026 0.45479403 0.35805744]
mean value: 0.4548429779078069
key: train_mcc
value: [0.78686712 0.58530546 0.73027531 0.74166179 0.71577977 0.7553501
0.7069685 0.77942077 0.531924 0.69196476]
mean value: 0.702551757572427
key: test_accuracy
value: [0.65517241 0.72413793 0.72413793 0.86206897 0.71428571 0.64285714
0.60714286 0.92857143 0.71428571 0.67857143]
mean value: 0.7251231527093596
key: train_accuracy
value: [0.89019608 0.76862745 0.86666667 0.85882353 0.8515625 0.87890625
0.83203125 0.890625 0.73828125 0.83203125]
mean value: 0.8407751225490196
key: test_fscore
value: [0.70588235 0.77777778 0.75 0.875 0.73333333 0.6875
0.52173913 0.93333333 0.77777778 0.68965517]
mean value: 0.7451998878011974
key: train_fscore
value: [0.90728477 0.8259587 0.88028169 0.856 0.85271318 0.89122807
0.82157676 0.9 0.80802292 0.82730924]
mean value: 0.8570375331957046
key: test_precision
value: [0.66666667 0.7 0.75 0.875 0.78571429 0.6875
0.75 0.93333333 0.66666667 0.71428571]
mean value: 0.7529166666666667
key: train_precision
value: [0.84567901 0.70351759 0.86805556 0.97272727 0.93220339 0.87586207
0.99 0.90647482 0.67788462 0.9537037 ]
mean value: 0.8726108026596435
key: test_recall
value: [0.75 0.875 0.75 0.875 0.6875 0.6875
0.4 0.93333333 0.93333333 0.66666667]
mean value: 0.7558333333333334
key: train_recall
value: [0.97857143 1. 0.89285714 0.76428571 0.78571429 0.90714286
0.70212766 0.89361702 1. 0.73049645]
mean value: 0.8654812563323202
key: test_roc_auc
value: [0.64423077 0.70673077 0.72115385 0.86057692 0.71875 0.63541667
0.62307692 0.92820513 0.6974359 0.67948718]
mean value: 0.7215064102564103
key: train_roc_auc
value: [0.88059006 0.74347826 0.86381988 0.86909938 0.85837438 0.87598522
0.846716 0.89028677 0.70869565 0.8435091 ]
mean value: 0.8380554707448707
key: test_jcc
value: [0.54545455 0.63636364 0.6 0.77777778 0.57894737 0.52380952
0.35294118 0.875 0.63636364 0.52631579]
mean value: 0.6052973454134445
key: train_jcc
value: [0.83030303 0.70351759 0.78616352 0.74825175 0.74324324 0.80379747
0.6971831 0.81818182 0.67788462 0.70547945]
mean value: 0.7514005584317507
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01621699 0.01477242 0.01502943 0.01693106 0.01710677 0.01634216
0.01618671 0.01679707 0.01618743 0.01548648]
mean value: 0.01610565185546875
key: score_time
value: [0.01161408 0.01149511 0.01152277 0.01150894 0.01155853 0.01150775
0.01153326 0.01155472 0.01158786 0.01207876]
mean value: 0.011596179008483887
key: test_mcc
value: [0.46375229 0.2956562 0.5943331 0.6505161 0.41666667 0.33113309
0.22739701 0.69388867 0.4241768 0.27928963]
mean value: 0.43768095519341854
key: train_mcc
value: [0.57308036 0.71593148 0.66101414 0.77134643 0.81121707 0.81285468
0.75895878 0.69508372 0.68195933 0.68578508]
mean value: 0.7167231067913732
key: test_accuracy
value: [0.72413793 0.65517241 0.79310345 0.82758621 0.71428571 0.67857143
0.60714286 0.82142857 0.71428571 0.64285714]
mean value: 0.7178571428571429
key: train_accuracy
value: [0.76470588 0.85490196 0.82745098 0.88627451 0.90625 0.90625
0.87109375 0.8359375 0.828125 0.8359375 ]
mean value: 0.8516927083333333
key: test_fscore
value: [0.78947368 0.70588235 0.83333333 0.84848485 0.75 0.74285714
0.59259259 0.8 0.75 0.70588235]
mean value: 0.7518506307360797
key: train_fscore
value: [0.82248521 0.87868852 0.85714286 0.90034364 0.91366906 0.91780822
0.87159533 0.83333333 0.86419753 0.86708861]
mean value: 0.8726352317903348
key: test_precision
value: [0.68181818 0.66666667 0.75 0.82352941 0.75 0.68421053
0.66666667 1. 0.70588235 0.63157895]
mean value: 0.7360352753541608
key: train_precision
value: [0.7020202 0.81212121 0.78571429 0.86754967 0.92028986 0.88157895
0.96551724 0.94594595 0.76502732 0.78285714]
mean value: 0.8428621823757527
key: test_recall
value: [0.9375 0.75 0.9375 0.875 0.75 0.8125
0.53333333 0.66666667 0.8 0.8 ]
mean value: 0.78625
key: train_recall
value: [0.99285714 0.95714286 0.94285714 0.93571429 0.90714286 0.95714286
0.79432624 0.74468085 0.9929078 0.97163121]
mean value: 0.9196403242147924
key: test_roc_auc
value: [0.69951923 0.64423077 0.77644231 0.82211538 0.70833333 0.65625
0.61282051 0.83333333 0.70769231 0.63076923]
mean value: 0.709150641025641
key: train_roc_auc
value: [0.73990683 0.84378882 0.81490683 0.88090062 0.90615764 0.90098522
0.87977182 0.84625347 0.80949738 0.82059821]
mean value: 0.8442766838465265
key: test_jcc
value: [0.65217391 0.54545455 0.71428571 0.73684211 0.6 0.59090909
0.42105263 0.66666667 0.6 0.54545455]
mean value: 0.6072839212656146
key: train_jcc
value: [0.69849246 0.78362573 0.75 0.81875 0.8410596 0.84810127
0.77241379 0.71428571 0.76086957 0.76536313]
mean value: 0.7752961262875675
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.14341903 0.12807274 0.13190699 0.12588644 0.12666917 0.1267252
0.1269629 0.12768006 0.1299634 0.13459158]
mean value: 0.1301877498626709
key: score_time
value: [0.01633096 0.01517463 0.01501942 0.01492524 0.01512504 0.0148921
0.01571321 0.01498175 0.01585507 0.01626587]
mean value: 0.01542832851409912
key: test_mcc
value: [0.6505161 0.37799476 0.6505161 0.43855669 0.40881491 0.33113309
0.43589744 0.69388867 0.4555973 0.35143175]
mean value: 0.4794346794243034
key: train_mcc
value: [0.99210575 1. 0.98426071 0.99211795 0.99214326 0.99214326
1. 0.98430987 0.9921307 0.98430987]
mean value: 0.991352136995068
key: test_accuracy
value: [0.82758621 0.68965517 0.82758621 0.72413793 0.71428571 0.67857143
0.71428571 0.82142857 0.71428571 0.67857143]
mean value: 0.739039408866995
key: train_accuracy
value: [0.99607843 1. 0.99215686 0.99607843 0.99609375 0.99609375
1. 0.9921875 0.99609375 0.9921875 ]
mean value: 0.9956969975490196
key: test_fscore
value: [0.84848485 0.70967742 0.84848485 0.76470588 0.76470588 0.74285714
0.71428571 0.8 0.69230769 0.70967742]
mean value: 0.7595186849835807
key: train_fscore
value: [0.99644128 1. 0.9929078 0.99641577 0.99644128 0.99644128
1. 0.99295775 0.99646643 0.99295775]
mean value: 0.9961029339497282
key: test_precision
value: [0.82352941 0.73333333 0.82352941 0.72222222 0.72222222 0.68421053
0.76923077 1. 0.81818182 0.6875 ]
mean value: 0.7783959715035567
key: train_precision
value: [0.9929078 1. 0.98591549 1. 0.9929078 0.9929078
1. 0.98601399 0.99295775 0.98601399]
mean value: 0.9929624615719911
key: test_recall
value: [0.875 0.6875 0.875 0.8125 0.8125 0.8125
0.66666667 0.66666667 0.6 0.73333333]
mean value: 0.7541666666666667
key: train_recall
value: [1. 1. 1. 0.99285714 1. 1.
1. 1. 1. 1. ]
mean value: 0.9992857142857143
key: test_roc_auc
value: [0.82211538 0.68990385 0.82211538 0.71394231 0.69791667 0.65625
0.71794872 0.83333333 0.72307692 0.67435897]
mean value: 0.7350961538461538
key: train_roc_auc
value: [0.99565217 1. 0.99130435 0.99642857 0.99568966 0.99568966
1. 0.99130435 0.99565217 0.99130435]
mean value: 0.9953025273077747
key: test_jcc
value: [0.73684211 0.55 0.73684211 0.61904762 0.61904762 0.59090909
0.55555556 0.66666667 0.52941176 0.55 ]
mean value: 0.615432252645875
key: train_jcc
value: [0.9929078 1. 0.98591549 0.99285714 0.9929078 0.9929078
1. 0.98601399 0.99295775 0.98601399]
mean value: 0.9922481758577054
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04842234 0.05589557 0.07119823 0.05455518 0.04850173 0.06984067
0.06670141 0.06200743 0.05922127 0.05085707]
mean value: 0.05872008800506592
key: score_time
value: [0.01725745 0.0303607 0.02712083 0.02353597 0.02750874 0.03366137
0.03804755 0.0257113 0.02761889 0.02240229]
mean value: 0.02732250690460205
key: test_mcc
value: [0.58173077 0.51675233 0.61653391 0.58145719 0.5625 0.55943093
0.50128041 0.69388867 0.66151858 0.64450339]
mean value: 0.591959617459089
key: train_mcc
value: [0.98430913 0.97629123 0.98430913 0.97657181 0.96065003 0.98438167
0.97664764 0.97664764 0.95392353 0.97664764]
mean value: 0.9750379441587029
key: test_accuracy
value: [0.79310345 0.75862069 0.79310345 0.79310345 0.78571429 0.78571429
0.75 0.82142857 0.82142857 0.82142857]
mean value: 0.7923645320197045
key: train_accuracy
value: [0.99215686 0.98823529 0.99215686 0.98823529 0.98046875 0.9921875
0.98828125 0.98828125 0.9765625 0.98828125]
mean value: 0.9874846813725491
key: test_fscore
value: [0.8125 0.77419355 0.78571429 0.82352941 0.8125 0.82352941
0.75862069 0.8 0.81481481 0.82758621]
mean value: 0.8032988368997334
key: train_fscore
value: [0.99280576 0.98924731 0.99280576 0.98916968 0.98207885 0.99280576
0.98924731 0.98924731 0.97826087 0.98924731]
mean value: 0.9884915911200943
key: test_precision
value: [0.8125 0.8 0.91666667 0.77777778 0.8125 0.77777778
0.78571429 1. 0.91666667 0.85714286]
mean value: 0.8456746031746032
key: train_precision
value: [1. 0.99280576 1. 1. 0.98561151 1.
1. 1. 1. 1. ]
mean value: 0.9978417266187051
key: test_recall
value: [0.8125 0.75 0.6875 0.875 0.8125 0.875
0.73333333 0.66666667 0.73333333 0.8 ]
mean value: 0.7745833333333333
key: train_recall
value: [0.98571429 0.98571429 0.98571429 0.97857143 0.97857143 0.98571429
0.9787234 0.9787234 0.95744681 0.9787234 ]
mean value: 0.9793617021276596
key: test_roc_auc
value: [0.79086538 0.75961538 0.80528846 0.78365385 0.78125 0.77083333
0.75128205 0.83333333 0.82820513 0.82307692]
mean value: 0.7927403846153847
key: train_roc_auc
value: [0.99285714 0.98850932 0.99285714 0.98928571 0.98066502 0.99285714
0.9893617 0.9893617 0.9787234 0.9893617 ]
mean value: 0.9883839994896169
key: test_jcc
value: [0.68421053 0.63157895 0.64705882 0.7 0.68421053 0.7
0.61111111 0.66666667 0.6875 0.70588235]
mean value: 0.6718218954248366
key: train_jcc
value: [0.98571429 0.9787234 0.98571429 0.97857143 0.96478873 0.98571429
0.9787234 0.9787234 0.95744681 0.9787234 ]
mean value: 0.9772843443640566
MCC on Blind test: 0.51
Accuracy on Blind test: 0.75
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.06892204 0.04836154 0.03385496 0.03343391 0.08310866 0.0448885
0.03518319 0.07023501 0.04377747 0.04133487]
mean value: 0.05031001567840576
key: score_time
value: [0.0234127 0.01284838 0.01282406 0.01289606 0.02261782 0.01286912
0.01278472 0.02182031 0.01295614 0.01290345]
mean value: 0.015793275833129884
key: test_mcc
value: [0.29458249 0.37799476 0.13968442 0.36720991 0.41666667 0.5625
0.20282899 0.71743483 0.36232865 0.35228194]
mean value: 0.3793512667922323
key: train_mcc
value: [0.99210575 0.99210575 0.98416149 1. 0.98433579 0.99214326
0.9921307 0.9921307 0.98430987 0.98430987]
mean value: 0.9897733186760725
key: test_accuracy
value: [0.65517241 0.68965517 0.5862069 0.68965517 0.71428571 0.78571429
0.60714286 0.85714286 0.67857143 0.67857143]
mean value: 0.6942118226600985
key: train_accuracy
value: [0.99607843 0.99607843 0.99215686 1. 0.9921875 0.99609375
0.99609375 0.99609375 0.9921875 0.9921875 ]
mean value: 0.9949157475490196
key: test_fscore
value: [0.72222222 0.70967742 0.68421053 0.74285714 0.75 0.8125
0.66666667 0.875 0.74285714 0.72727273]
mean value: 0.743326384754653
key: train_fscore
value: [0.99644128 0.99644128 0.99285714 1. 0.9929078 0.99644128
0.99646643 0.99646643 0.99295775 0.99295775]
mean value: 0.9953937142840512
key: test_precision
value: [0.65 0.73333333 0.59090909 0.68421053 0.75 0.8125
0.61111111 0.82352941 0.65 0.66666667]
mean value: 0.6972260140100698
key: train_precision
value: [0.9929078 0.9929078 0.99285714 1. 0.98591549 0.9929078
0.99295775 0.99295775 0.98601399 0.98601399]
mean value: 0.9915439505055927
key: test_recall
value: [0.8125 0.6875 0.8125 0.8125 0.75 0.8125
0.73333333 0.93333333 0.86666667 0.8 ]
mean value: 0.8020833333333334
key: train_recall
value: [1. 1. 0.99285714 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9992857142857143
key: test_roc_auc
value: [0.63701923 0.68990385 0.56009615 0.67548077 0.70833333 0.78125
0.5974359 0.85128205 0.66410256 0.66923077]
mean value: 0.6834134615384615
key: train_roc_auc
value: [0.99565217 0.99565217 0.99208075 1. 0.99137931 0.99568966
0.99565217 0.99565217 0.99130435 0.99130435]
mean value: 0.9944367102163204
key: test_jcc
value: [0.56521739 0.55 0.52 0.59090909 0.6 0.68421053
0.5 0.77777778 0.59090909 0.57142857]
mean value: 0.5950452448644669
key: train_jcc
value: [0.9929078 0.9929078 0.9858156 1. 0.98591549 0.9929078
0.99295775 0.99295775 0.98601399 0.98601399]
mean value: 0.9908397965035664
MCC on Blind test: 0.25
Accuracy on Blind test: 0.64
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.47307181 0.44717479 0.45278549 0.45123696 0.45926499 0.45636559
0.45906496 0.45645404 0.44945335 0.45685482]
mean value: 0.45617268085479734
key: score_time
value: [0.00952506 0.00908566 0.00928783 0.0093286 0.009516 0.00935721
0.00914192 0.00926089 0.01024389 0.00966859]
mean value: 0.009441566467285157
key: test_mcc
value: [0.6505161 0.51675233 0.58173077 0.6505161 0.40881491 0.64019064
0.50128041 0.72307692 0.57948718 0.51681139]
mean value: 0.5769176744958047
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.82758621 0.75862069 0.79310345 0.82758621 0.71428571 0.82142857
0.75 0.85714286 0.78571429 0.75 ]
mean value: 0.7885467980295566
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84848485 0.77419355 0.8125 0.84848485 0.76470588 0.85714286
0.75862069 0.85714286 0.78571429 0.74074074]
mean value: 0.8047730558105648
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82352941 0.8 0.8125 0.82352941 0.72222222 0.78947368
0.78571429 0.92307692 0.84615385 0.83333333]
mean value: 0.8159533118240548
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.75 0.8125 0.875 0.8125 0.9375
0.73333333 0.8 0.73333333 0.66666667]
mean value: 0.7995833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.82211538 0.75961538 0.79086538 0.82211538 0.69791667 0.80208333
0.75128205 0.86153846 0.78974359 0.75641026]
mean value: 0.7853685897435897
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.73684211 0.63157895 0.68421053 0.73684211 0.61904762 0.75
0.61111111 0.75 0.64705882 0.58823529]
mean value: 0.6754926532016315
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02638793 0.02276492 0.02334905 0.02273154 0.03356719 0.02284646
0.02296495 0.02272868 0.02332139 0.02315092]
mean value: 0.024381303787231447
key: score_time
value: [0.01213121 0.01963043 0.01413655 0.01541519 0.01413393 0.01488996
0.01563334 0.01552153 0.01470804 0.01829457]
mean value: 0.01544947624206543
key: test_mcc
value: [0.39546094 0.36894943 0.221332 0.14470719 0.09128709 0.4
0.12687831 0.5859606 0.20380987 0.12403473]
mean value: 0.2662420159139888
key: train_mcc
value: [0.63202573 0.5986817 0.66534784 0.65201286 0.61444446 0.60121238
0.66594163 0.57250836 0.57922054 0.57922054]
mean value: 0.616061603996789
key: test_accuracy
value: [0.68965517 0.68965517 0.62068966 0.5862069 0.57142857 0.67857143
0.57142857 0.78571429 0.60714286 0.57142857]
mean value: 0.6371921182266009
key: train_accuracy
value: [0.79607843 0.77647059 0.81568627 0.80784314 0.78515625 0.77734375
0.81640625 0.76171875 0.765625 0.765625 ]
mean value: 0.7867953431372549
key: test_fscore
value: [0.76923077 0.72727273 0.68571429 0.66666667 0.66666667 0.7804878
0.68421053 0.82352941 0.68571429 0.66666667]
mean value: 0.7156159810890612
key: train_fscore
value: [0.84337349 0.83086053 0.85626911 0.85106383 0.8358209 0.83086053
0.85714286 0.82215743 0.8245614 0.8245614 ]
mean value: 0.8376671499247365
key: test_precision
value: [0.65217391 0.70588235 0.63157895 0.6 0.6 0.64
0.56521739 0.73684211 0.6 0.57142857]
mean value: 0.6303123281349152
key: train_precision
value: [0.72916667 0.7106599 0.7486631 0.74074074 0.71794872 0.7106599
0.75 0.6980198 0.70149254 0.70149254]
mean value: 0.7208843900521782
key: test_recall
value: [0.9375 0.75 0.75 0.75 0.75 1.
0.86666667 0.93333333 0.8 0.8 ]
mean value: 0.83375
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66105769 0.68269231 0.60576923 0.56730769 0.54166667 0.625
0.54871795 0.77435897 0.59230769 0.55384615]
mean value: 0.615272435897436
key: train_roc_auc
value: [0.77391304 0.75217391 0.79565217 0.78695652 0.76293103 0.75431034
0.79565217 0.73478261 0.73913043 0.73913043]
mean value: 0.7634632683658171
key: test_jcc
value: [0.625 0.57142857 0.52173913 0.5 0.5 0.64
0.52 0.7 0.52173913 0.5 ]
mean value: 0.5599906832298136
key: train_jcc
value: [0.72916667 0.7106599 0.7486631 0.74074074 0.71794872 0.7106599
0.75 0.6980198 0.70149254 0.70149254]
mean value: 0.7208843900521782
MCC on Blind test: 0.14
Accuracy on Blind test: 0.59
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02379489 0.01429152 0.02167296 0.03507352 0.01420116 0.01453209
0.02825117 0.03548551 0.034168 0.03530765]
mean value: 0.02567784786224365
key: score_time
value: [0.01208448 0.01182032 0.02128649 0.01187086 0.01207566 0.01173615
0.02239275 0.02398872 0.02039075 0.02078676]
mean value: 0.016843295097351073
key: test_mcc
value: [0.36894943 0.45455066 0.51308782 0.6505161 0.48553038 0.33113309
0.21483446 0.73929609 0.51681139 0.4241768 ]
mean value: 0.4698886211837859
key: train_mcc
value: [0.8097547 0.80188999 0.78596805 0.80974419 0.78701889 0.81887846
0.81910397 0.79449635 0.82618954 0.77076024]
mean value: 0.8023804385252125
key: test_accuracy
value: [0.68965517 0.72413793 0.75862069 0.82758621 0.75 0.67857143
0.60714286 0.85714286 0.75 0.71428571]
mean value: 0.7357142857142858
key: train_accuracy
value: [0.90588235 0.90196078 0.89411765 0.90588235 0.89453125 0.91015625
0.91015625 0.8984375 0.9140625 0.88671875]
mean value: 0.9021905637254902
key: test_fscore
value: [0.72727273 0.73333333 0.8 0.84848485 0.78787879 0.74285714
0.62068966 0.88235294 0.74074074 0.75 ]
mean value: 0.7633610176916465
key: train_fscore
value: [0.91549296 0.91103203 0.90526316 0.91489362 0.90526316 0.91756272
0.91756272 0.90909091 0.92307692 0.8989547 ]
mean value: 0.9118192903056238
key: test_precision
value: [0.70588235 0.78571429 0.73684211 0.82352941 0.76470588 0.68421053
0.64285714 0.78947368 0.83333333 0.70588235]
mean value: 0.7472431077694236
key: train_precision
value: [0.90277778 0.90780142 0.88965517 0.9084507 0.88965517 0.92086331
0.92753623 0.89655172 0.91034483 0.88356164]
mean value: 0.9037197982066763
key: test_recall
value: [0.75 0.6875 0.875 0.875 0.8125 0.8125
0.6 1. 0.66666667 0.8 ]
mean value: 0.7879166666666667
key: train_recall
value: [0.92857143 0.91428571 0.92142857 0.92142857 0.92142857 0.91428571
0.90780142 0.92198582 0.93617021 0.91489362]
mean value: 0.9202279635258358
key: test_roc_auc
value: [0.68269231 0.72836538 0.74519231 0.82211538 0.73958333 0.65625
0.60769231 0.84615385 0.75641026 0.70769231]
mean value: 0.7292147435897436
key: train_roc_auc
value: [0.90341615 0.90062112 0.89114907 0.90419255 0.89174877 0.90972906
0.91042245 0.89577552 0.91156337 0.88353377]
mean value: 0.9002151811632177
key: test_jcc
value: [0.57142857 0.57894737 0.66666667 0.73684211 0.65 0.59090909
0.45 0.78947368 0.58823529 0.6 ]
mean value: 0.6222502781016713
key: train_jcc
value: [0.84415584 0.83660131 0.82692308 0.84313725 0.82692308 0.84768212
0.84768212 0.83333333 0.85714286 0.8164557 ]
mean value: 0.8380036685182819
MCC on Blind test: 0.31
Accuracy on Blind test: 0.66
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.13301873 0.25432467 0.27289152 0.35769796 0.24618363 0.25176811
0.28279519 0.18110466 0.1442101 0.2438333 ]
mean value: 0.23678278923034668
key: score_time
value: [0.01179385 0.02128005 0.02014399 0.0181601 0.02356052 0.02306724
0.01739192 0.02378941 0.02485228 0.02287745]
mean value: 0.020691680908203124
key: test_mcc
value: [0.43855669 0.51675233 0.51308782 0.5943331 0.57054433 0.25819889
0.21483446 0.73929609 0.64450339 0.4241768 ]
mean value: 0.4914283896819048
key: train_mcc
value: [0.69116162 0.68292978 0.70648469 0.65069859 0.67620784 0.70049261
0.81910397 0.79449635 0.68431304 0.77076024]
mean value: 0.7176648721292616
key: test_accuracy
value: [0.72413793 0.75862069 0.75862069 0.79310345 0.78571429 0.64285714
0.60714286 0.85714286 0.82142857 0.71428571]
mean value: 0.7463054187192119
key: train_accuracy
value: [0.84705882 0.84313725 0.85490196 0.82745098 0.83984375 0.8515625
0.91015625 0.8984375 0.84375 0.88671875]
mean value: 0.8603017769607844
key: test_fscore
value: [0.76470588 0.77419355 0.8 0.83333333 0.83333333 0.70588235
0.62068966 0.88235294 0.82758621 0.75 ]
mean value: 0.7792077253593317
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.86597938 0.86206897 0.87108014 0.84722222 0.85714286 0.86428571
0.91756272 0.90909091 0.86394558 0.8989547 ]
mean value: 0.8757333195153447
key: test_precision
value: [0.72222222 0.8 0.73684211 0.75 0.75 0.66666667
0.64285714 0.78947368 0.85714286 0.70588235]
mean value: 0.7421087031303749
key: train_precision
value: [0.83443709 0.83333333 0.85034014 0.82432432 0.83673469 0.86428571
0.92753623 0.89655172 0.83006536 0.88356164]
mean value: 0.858117024730279
key: test_recall
value: [0.8125 0.75 0.875 0.9375 0.9375 0.75 0.6 1. 0.8 0.8 ]
mean value: 0.82625
key: train_recall
value: [0.9 0.89285714 0.89285714 0.87142857 0.87857143 0.86428571
0.90780142 0.92198582 0.90070922 0.91489362]
mean value: 0.8945390070921986
key: test_roc_auc
value: [0.71394231 0.75961538 0.74519231 0.77644231 0.76041667 0.625
0.60769231 0.84615385 0.82307692 0.70769231]
mean value: 0.7365224358974359
key: train_roc_auc
value: [0.84130435 0.83773292 0.8507764 0.82267081 0.83583744 0.85024631
0.91042245 0.89577552 0.83731113 0.88353377]
mean value: 0.8565611077440003
key: test_jcc
value: [0.61904762 0.63157895 0.66666667 0.71428571 0.71428571 0.54545455
0.45 0.78947368 0.70588235 0.6 ]
mean value: 0.6436675244260384
key: train_jcc
value: [0.76363636 0.75757576 0.77160494 0.73493976 0.75 0.76100629
0.84768212 0.83333333 0.76047904 0.8164557 ]
mean value: 0.7796713298485378
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03288984 0.03522205 0.03412771 0.03215051 0.03374076 0.03229523
0.03486323 0.03421307 0.0346148 0.03472567]
mean value: 0.033884286880493164
key: score_time
value: [0.01216006 0.01787066 0.01389241 0.01186514 0.01227212 0.01199508
0.01420188 0.01425934 0.0143292 0.01416445]
mean value: 0.013701033592224122
key: test_mcc
value: [0.31814238 0.37796447 0.48333333 0.68826048 0.6778302 0.48333333
0.61608311 0.69203857 0.4184137 0.4184137 ]
mean value: 0.5173813291351835
key: train_mcc
value: [0.75722013 0.72864578 0.74385734 0.70836501 0.71529889 0.73015914
0.71535695 0.70820669 0.74383139 0.73679947]
mean value: 0.7287740786080976
key: test_accuracy
value: [0.65625 0.6875 0.74193548 0.83870968 0.83870968 0.74193548
0.80645161 0.83870968 0.70967742 0.70967742]
mean value: 0.7569556451612903
key: train_accuracy
value: [0.87857143 0.86428571 0.87188612 0.85409253 0.85765125 0.86476868
0.85765125 0.85409253 0.87188612 0.8683274 ]
mean value: 0.8643213014743264
key: test_fscore
value: [0.68571429 0.70588235 0.75 0.85714286 0.84848485 0.75
0.78571429 0.84848485 0.68965517 0.68965517]
mean value: 0.7610733823309888
key: train_fscore
value: [0.87943262 0.86524823 0.87234043 0.85512367 0.85714286 0.86131387
0.85915493 0.85409253 0.87323944 0.87017544]
mean value: 0.8647264008747467
key: test_precision
value: [0.63157895 0.66666667 0.75 0.78947368 0.82352941 0.75
0.84615385 0.77777778 0.71428571 0.71428571]
mean value: 0.7463751762513372
key: train_precision
value: [0.87323944 0.85915493 0.86619718 0.84615385 0.85714286 0.88059701
0.85314685 0.85714286 0.86713287 0.86111111]
mean value: 0.8621018956051539
key: test_recall
value: [0.75 0.75 0.75 0.9375 0.875 0.75
0.73333333 0.93333333 0.66666667 0.66666667]
mean value: 0.78125
key: train_recall
value: [0.88571429 0.87142857 0.87857143 0.86428571 0.85714286 0.84285714
0.86524823 0.85106383 0.87943262 0.87943262]
mean value: 0.8675177304964539
key: test_roc_auc
value: [0.65625 0.6875 0.74166667 0.83541667 0.8375 0.74166667
0.80416667 0.84166667 0.70833333 0.70833333]
mean value: 0.75625
key: train_roc_auc
value: [0.87857143 0.86428571 0.87190983 0.85412867 0.85764944 0.86469098
0.85762411 0.85410334 0.87185917 0.86828774]
mean value: 0.8643110435663628
key: test_jcc
value: [0.52173913 0.54545455 0.6 0.75 0.73684211 0.6
0.64705882 0.73684211 0.52631579 0.52631579]
mean value: 0.6190568288892424
key: train_jcc
value: [0.78481013 0.7625 0.77358491 0.74691358 0.75 0.75641026
0.75308642 0.74534161 0.775 0.77018634]
mean value: 0.7617833238963472
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.77032852 0.97276163 0.80582404 0.79673815 0.8783679 0.79550815
0.79311538 0.96027112 0.78945112 0.9483645 ]
mean value: 0.8510730504989624
key: score_time
value: [0.01441407 0.01222777 0.01214099 0.01213002 0.01198173 0.01428342
0.01189327 0.01310682 0.01192641 0.01187658]
mean value: 0.012598109245300294
key: test_mcc
value: [0.31311215 0.25 0.54812195 0.68826048 0.6778302 0.37616262
0.61608311 0.69203857 0.48333333 0.48527095]
mean value: 0.5130213359522333
key: train_mcc
value: [0.91465912 0.64450339 0.68018255 0.6728996 0.99290744 1.
0.64434879 0.6442069 0.66585571 0.69434748]
mean value: 0.755391096954088
key: test_accuracy
value: [0.65625 0.625 0.77419355 0.83870968 0.83870968 0.67741935
0.80645161 0.83870968 0.74193548 0.74193548]
mean value: 0.7539314516129032
key: train_accuracy
value: [0.95714286 0.82142857 0.83985765 0.83629893 0.99644128 1.
0.82206406 0.82206406 0.83274021 0.84697509]
mean value: 0.8775012709710218
key: test_fscore
value: [0.66666667 0.625 0.78787879 0.85714286 0.84848485 0.73684211
0.78571429 0.84848485 0.73333333 0.71428571]
mean value: 0.76038334472545
key: train_fscore
value: [0.95774648 0.82758621 0.84210526 0.83802817 0.99641577 1.
0.82517483 0.82142857 0.83623693 0.85017422]
mean value: 0.8794896434980269
key: test_precision
value: [0.64705882 0.625 0.76470588 0.78947368 0.82352941 0.63636364
0.84615385 0.77777778 0.73333333 0.76923077]
mean value: 0.7412627164716948
key: train_precision
value: [0.94444444 0.8 0.82758621 0.82638889 1. 1.
0.8137931 0.82733813 0.82191781 0.83561644]
mean value: 0.8697085019749906
key: test_recall
value: [0.6875 0.625 0.8125 0.9375 0.875 0.875
0.73333333 0.93333333 0.73333333 0.66666667]
mean value: 0.7879166666666666
key: train_recall
value: [0.97142857 0.85714286 0.85714286 0.85 0.99285714 1.
0.83687943 0.81560284 0.85106383 0.86524823]
mean value: 0.8897365754812563
key: test_roc_auc
value: [0.65625 0.625 0.77291667 0.83541667 0.8375 0.67083333
0.80416667 0.84166667 0.74166667 0.73958333]
mean value: 0.7525
key: train_roc_auc
value: [0.95714286 0.82142857 0.83991895 0.83634752 0.99642857 1.
0.82201114 0.82208713 0.83267477 0.84690983]
mean value: 0.8774949341438704
key: test_jcc
value: [0.5 0.45454545 0.65 0.75 0.73684211 0.58333333
0.64705882 0.73684211 0.57894737 0.55555556]
mean value: 0.6193124745911124
key: train_jcc
value: [0.91891892 0.70588235 0.72727273 0.72121212 0.99285714 1.
0.70238095 0.6969697 0.71856287 0.73939394]
mean value: 0.7923450726198172
MCC on Blind test: 0.42
Accuracy on Blind test: 0.71
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01640129 0.01214981 0.01048613 0.01034427 0.01015806 0.01013517
0.01000237 0.01033115 0.01018715 0.01006889]
mean value: 0.011026430130004882
key: score_time
value: [0.01300478 0.00980377 0.00967383 0.00959897 0.00941086 0.00940037
0.009377 0.00936294 0.00931811 0.00939512]
mean value: 0.009834575653076171
key: test_mcc
value: [0.46056619 0.06579517 0.49612132 0.37616262 0.55777335 0.61608311
0.54812195 0.58316015 0.48954403 0.37191715]
mean value: 0.45652450481312273
key: train_mcc
value: [0.49419142 0.48038446 0.53031544 0.47188822 0.5319049 0.53856487
0.53744808 0.52073939 0.49113636 0.54241467]
mean value: 0.5138987824814698
key: test_accuracy
value: [0.71875 0.53125 0.74193548 0.67741935 0.74193548 0.80645161
0.77419355 0.77419355 0.74193548 0.67741935]
mean value: 0.7185483870967742
key: train_accuracy
value: [0.73571429 0.725 0.75800712 0.72953737 0.76156584 0.76156584
0.76868327 0.75088968 0.7366548 0.76512456]
mean value: 0.7492742755465175
key: test_fscore
value: [0.75675676 0.59459459 0.77777778 0.73684211 0.8 0.82352941
0.75862069 0.8 0.75 0.70588235]
mean value: 0.7504003688753342
key: train_fscore
value: [0.77018634 0.76595745 0.78205128 0.75641026 0.78032787 0.78594249
0.77192982 0.78125 0.76875 0.78846154]
mean value: 0.7751267044561957
key: test_precision
value: [0.66666667 0.52380952 0.7 0.63636364 0.66666667 0.77777778
0.78571429 0.7 0.70588235 0.63157895]
mean value: 0.6794459857308155
key: train_precision
value: [0.68131868 0.66666667 0.70930233 0.68604651 0.72121212 0.71098266
0.76388889 0.69832402 0.68715084 0.71929825]
mean value: 0.7044190960204428
key: test_recall
value: [0.875 0.6875 0.875 0.875 1. 0.875
0.73333333 0.93333333 0.8 0.8 ]
mean value: 0.8454166666666667
key: train_recall
value: [0.88571429 0.9 0.87142857 0.84285714 0.85 0.87857143
0.78014184 0.88652482 0.87234043 0.87234043]
mean value: 0.8639918946301925
key: test_roc_auc
value: [0.71875 0.53125 0.7375 0.67083333 0.73333333 0.80416667
0.77291667 0.77916667 0.74375 0.68125 ]
mean value: 0.7172916666666667
key: train_roc_auc
value: [0.73571429 0.725 0.75840932 0.72993921 0.76187943 0.76198075
0.76864235 0.75040527 0.73617021 0.76474164]
mean value: 0.7492882472137791
key: test_jcc
value: [0.60869565 0.42307692 0.63636364 0.58333333 0.66666667 0.7
0.61111111 0.66666667 0.6 0.54545455]
mean value: 0.6041368534846796
key: train_jcc
value: [0.62626263 0.62068966 0.64210526 0.60824742 0.63978495 0.64736842
0.62857143 0.64102564 0.62436548 0.65079365]
mean value: 0.632921453718676
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01051235 0.01030493 0.01025033 0.01022387 0.01026034 0.01030755
0.01042366 0.01025248 0.01031613 0.01030803]
mean value: 0.010315966606140137
key: score_time
value: [0.00944209 0.00935793 0.0093224 0.00918984 0.00931573 0.00929213
0.00936365 0.0089705 0.00935435 0.00958991]
mean value: 0.00931985378265381
key: test_mcc
value: [0.19088543 0.438357 0.55 0.4365267 0.61608311 0.4184137
0.48527095 0.5612264 0.35983579 0.55 ]
mean value: 0.46065990788811395
key: train_mcc
value: [0.57858619 0.54486237 0.52431066 0.54736197 0.51744233 0.52431066
0.55262901 0.52460395 0.53764274 0.5252232 ]
mean value: 0.5376973074762675
key: test_accuracy
value: [0.59375 0.71875 0.77419355 0.70967742 0.80645161 0.70967742
0.74193548 0.77419355 0.67741935 0.77419355]
mean value: 0.7280241935483871
key: train_accuracy
value: [0.78928571 0.77142857 0.76156584 0.77224199 0.75800712 0.76156584
0.77580071 0.76156584 0.76868327 0.76156584]
mean value: 0.7681710726995424
key: test_fscore
value: [0.62857143 0.72727273 0.77419355 0.75675676 0.82352941 0.72727273
0.71428571 0.78787879 0.6875 0.77419355]
mean value: 0.7401454650577042
key: train_fscore
value: [0.79003559 0.78082192 0.76816609 0.78231293 0.76551724 0.76816609
0.78350515 0.77133106 0.77351916 0.77288136]
mean value: 0.7756256583831929
key: test_precision
value: [0.57894737 0.70588235 0.8 0.66666667 0.77777778 0.70588235
0.76923077 0.72222222 0.64705882 0.75 ]
mean value: 0.7123668333730253
key: train_precision
value: [0.78723404 0.75 0.74496644 0.74675325 0.74 0.74496644
0.76 0.74342105 0.76027397 0.74025974]
mean value: 0.7517874940706537
key: test_recall
value: [0.6875 0.75 0.75 0.875 0.875 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7754166666666666
key: train_recall
value: [0.79285714 0.81428571 0.79285714 0.82142857 0.79285714 0.79285714
0.80851064 0.80141844 0.78723404 0.80851064]
mean value: 0.8012816616008105
key: test_roc_auc
value: [0.59375 0.71875 0.775 0.70416667 0.80416667 0.70833333
0.73958333 0.77708333 0.67916667 0.775 ]
mean value: 0.7275
key: train_roc_auc
value: [0.78928571 0.77142857 0.7616768 0.77241641 0.7581307 0.7616768
0.77568389 0.76142351 0.76861702 0.76139818]
mean value: 0.7681737588652482
key: test_jcc
value: [0.45833333 0.57142857 0.63157895 0.60869565 0.7 0.57142857
0.55555556 0.65 0.52380952 0.63157895]
mean value: 0.5902409102466311
key: train_jcc
value: [0.65294118 0.64044944 0.62359551 0.6424581 0.62011173 0.62359551
0.6440678 0.62777778 0.63068182 0.62983425]
mean value: 0.6335513105024437
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00951195 0.0097332 0.0096612 0.00966334 0.00966907 0.00986385
0.00983644 0.01014018 0.0087738 0.00984406]
mean value: 0.00966970920562744
key: score_time
value: [0.01151824 0.01224685 0.01311278 0.01185679 0.01232314 0.01154304
0.01158881 0.012007 0.01123166 0.01146412]
mean value: 0.011889243125915527
key: test_mcc
value: [0.38729833 0.12598816 0.16878989 0.29166667 0.15899721 0.28870546
0.22630095 0.55 0.35445878 0.28870546]
mean value: 0.2840910904397822
key: train_mcc
value: [0.5872142 0.60790321 0.62988855 0.58726379 0.58726379 0.60170267
0.64819964 0.5803804 0.59462628 0.62347871]
mean value: 0.6047921251531794
key: test_accuracy
value: [0.6875 0.5625 0.58064516 0.64516129 0.58064516 0.64516129
0.61290323 0.77419355 0.67741935 0.64516129]
mean value: 0.6411290322580645
key: train_accuracy
value: [0.79285714 0.80357143 0.81494662 0.79359431 0.79359431 0.80071174
0.82206406 0.79003559 0.79715302 0.8113879 ]
mean value: 0.8019916115912558
key: test_fscore
value: [0.72222222 0.53333333 0.55172414 0.64516129 0.60606061 0.66666667
0.53846154 0.77419355 0.64285714 0.62068966]
mean value: 0.6301370141414636
key: train_fscore
value: [0.8 0.80836237 0.81428571 0.79432624 0.79432624 0.8028169
0.83221477 0.79442509 0.80139373 0.816609 ]
mean value: 0.8058760044273121
key: test_precision
value: [0.65 0.57142857 0.61538462 0.66666667 0.58823529 0.64705882
0.63636364 0.75 0.69230769 0.64285714]
mean value: 0.6460302442655383
key: train_precision
value: [0.77333333 0.78911565 0.81428571 0.78873239 0.78873239 0.79166667
0.78980892 0.78082192 0.78767123 0.7972973 ]
mean value: 0.7901465514456293
key: test_recall
value: [0.8125 0.5 0.5 0.625 0.625 0.6875
0.46666667 0.8 0.6 0.6 ]
mean value: 0.6216666666666667
key: train_recall
value: [0.82857143 0.82857143 0.81428571 0.8 0.8 0.81428571
0.87943262 0.80851064 0.81560284 0.83687943]
mean value: 0.8226139817629179
key: test_roc_auc
value: [0.6875 0.5625 0.58333333 0.64583333 0.57916667 0.64375
0.60833333 0.775 0.675 0.64375 ]
mean value: 0.6404166666666667
key: train_roc_auc
value: [0.79285714 0.80357143 0.81494428 0.79361702 0.79361702 0.80075988
0.82185917 0.7899696 0.79708713 0.81129686]
mean value: 0.8019579533941237
key: test_jcc
value: [0.56521739 0.36363636 0.38095238 0.47619048 0.43478261 0.5
0.36842105 0.63157895 0.47368421 0.45 ]
mean value: 0.46444634313055366
key: train_jcc
value: [0.66666667 0.67836257 0.68674699 0.65882353 0.65882353 0.67058824
0.71264368 0.65895954 0.66860465 0.69005848]
mean value: 0.6750277868263664
MCC on Blind test: 0.14
Accuracy on Blind test: 0.57
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01648831 0.01415229 0.01382375 0.01463223 0.0138309 0.01456952
0.01394272 0.01430678 0.01460671 0.01483178]
mean value: 0.014518499374389648
key: score_time
value: [0.01045418 0.00998831 0.00985622 0.00994062 0.00994372 0.00987506
0.00997186 0.0103848 0.01011252 0.01010108]
mean value: 0.010062837600708007
key: test_mcc
value: [0.40451992 0.438357 0.48333333 0.68826048 0.6125 0.4184137
0.55 0.69203857 0.48333333 0.43041423]
mean value: 0.5201170568656933
key: train_mcc
value: [0.73799581 0.72864578 0.72953394 0.74385734 0.70292136 0.75103885
0.71640396 0.68010159 0.7083207 0.71704623]
mean value: 0.7215865560453796
key: test_accuracy
value: [0.6875 0.71875 0.74193548 0.83870968 0.80645161 0.70967742
0.77419355 0.83870968 0.74193548 0.70967742]
mean value: 0.7567540322580645
key: train_accuracy
value: [0.86785714 0.86428571 0.86476868 0.87188612 0.85053381 0.87544484
0.85765125 0.83985765 0.85409253 0.85765125]
mean value: 0.8604028978139299
key: test_fscore
value: [0.73684211 0.70967742 0.75 0.85714286 0.8125 0.72727273
0.77419355 0.84848485 0.73333333 0.72727273]
mean value: 0.7676719566511587
key: train_fscore
value: [0.87285223 0.86524823 0.86428571 0.87234043 0.85517241 0.87364621
0.86206897 0.84320557 0.85614035 0.8630137 ]
mean value: 0.8627973813561808
key: test_precision
value: [0.63636364 0.73333333 0.75 0.78947368 0.8125 0.70588235
0.75 0.77777778 0.73333333 0.66666667]
mean value: 0.735533078462645
key: train_precision
value: [0.8410596 0.85915493 0.86428571 0.86619718 0.82666667 0.88321168
0.83892617 0.82876712 0.84722222 0.83443709]
mean value: 0.8489928381208813
key: test_recall
value: [0.875 0.6875 0.75 0.9375 0.8125 0.75
0.8 0.93333333 0.73333333 0.8 ]
mean value: 0.8079166666666666
key: train_recall
value: [0.90714286 0.87142857 0.86428571 0.87857143 0.88571429 0.86428571
0.88652482 0.85815603 0.86524823 0.89361702]
mean value: 0.8774974670719351
key: test_roc_auc
value: [0.6875 0.71875 0.74166667 0.83541667 0.80625 0.70833333
0.775 0.84166667 0.74166667 0.7125 ]
mean value: 0.7568750000000001
key: train_roc_auc
value: [0.86785714 0.86428571 0.86476697 0.87190983 0.85065856 0.87540527
0.85754813 0.8397923 0.85405268 0.8575228 ]
mean value: 0.8603799392097264
key: test_jcc
value: [0.58333333 0.55 0.6 0.75 0.68421053 0.57142857
0.63157895 0.73684211 0.57894737 0.57142857]
mean value: 0.6257769423558898
key: train_jcc
value: [0.77439024 0.7625 0.76100629 0.77358491 0.74698795 0.77564103
0.75757576 0.72891566 0.74846626 0.75903614]
mean value: 0.7588104238792632
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.1177218 1.43766284 1.54514217 1.15275574 1.26740909 1.14019227
1.27580643 1.12184954 1.25379801 1.11662126]
mean value: 1.2428959131240844
key: score_time
value: [0.01441884 0.01461029 0.02055955 0.01454592 0.01471639 0.01474786
0.01493645 0.01513934 0.0150795 0.0150938 ]
mean value: 0.015384793281555176
key: test_mcc
value: [0.31311215 0.44539933 0.54812195 0.6310315 0.6125 0.4184137
0.35416667 0.74896053 0.42321607 0.48954403]
mean value: 0.4984465941438147
key: train_mcc
value: [0.98581488 0.97142857 0.9929078 0.9929078 0.98586555 0.99290744
0.978869 0.97867167 0.98586412 0.978869 ]
mean value: 0.9844105844205777
key: test_accuracy
value: [0.65625 0.71875 0.77419355 0.80645161 0.80645161 0.70967742
0.67741935 0.87096774 0.70967742 0.74193548]
mean value: 0.7471774193548387
key: train_accuracy
value: [0.99285714 0.98571429 0.99644128 0.99644128 0.99288256 0.99644128
0.98932384 0.98932384 0.99288256 0.98932384]
mean value: 0.9921631926792069
key: test_fscore
value: [0.66666667 0.68965517 0.78787879 0.83333333 0.8125 0.72727273
0.66666667 0.875 0.66666667 0.75 ]
mean value: 0.7475640020898642
key: train_fscore
value: [0.9929078 0.98571429 0.99644128 0.99644128 0.9929078 0.99641577
0.98947368 0.98939929 0.99295775 0.98947368]
mean value: 0.992213262962421
key: test_precision
value: [0.64705882 0.76923077 0.76470588 0.75 0.8125 0.70588235
0.66666667 0.82352941 0.75 0.70588235]
mean value: 0.7395456259426848
key: train_precision
value: [0.98591549 0.98571429 0.9929078 0.9929078 0.98591549 1.
0.97916667 0.98591549 0.98601399 0.97916667]
mean value: 0.9873623686771724
key: test_recall
value: [0.6875 0.625 0.8125 0.9375 0.8125 0.75
0.66666667 0.93333333 0.6 0.8 ]
mean value: 0.7625
key: train_recall
value: [1. 0.98571429 1. 1. 1. 0.99285714
1. 0.9929078 1. 1. ]
mean value: 0.9971479229989868
key: test_roc_auc
value: [0.65625 0.71875 0.77291667 0.80208333 0.80625 0.70833333
0.67708333 0.87291667 0.70625 0.74375 ]
mean value: 0.7464583333333333
key: train_roc_auc
value: [0.99285714 0.98571429 0.9964539 0.9964539 0.9929078 0.99642857
0.98928571 0.98931104 0.99285714 0.98928571]
mean value: 0.9921555217831813
key: test_jcc
value: [0.5 0.52631579 0.65 0.71428571 0.68421053 0.57142857
0.5 0.77777778 0.5 0.6 ]
mean value: 0.6024018379281537
key: train_jcc
value: [0.98591549 0.97183099 0.9929078 0.9929078 0.98591549 0.99285714
0.97916667 0.97902098 0.98601399 0.97916667]
mean value: 0.9845703015893307
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03034997 0.01905537 0.02076721 0.0199306 0.02213931 0.0209415
0.02394128 0.02460599 0.02585983 0.0239253 ]
mean value: 0.023151636123657227
key: score_time
value: [0.01116729 0.00906777 0.00881743 0.00873852 0.00879312 0.00883675
0.00890684 0.00884533 0.00888515 0.00888705]
mean value: 0.009094524383544921
key: test_mcc
value: [0.438357 0.625 0.35416667 0.6778302 0.6778302 0.71269665
0.48333333 0.42083333 0.42083333 0.48954403]
mean value: 0.5300424751832439
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71875 0.8125 0.67741935 0.83870968 0.83870968 0.83870968
0.74193548 0.70967742 0.70967742 0.74193548]
mean value: 0.7628024193548387
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.72727273 0.8125 0.6875 0.84848485 0.84848485 0.86486486
0.73333333 0.70967742 0.70967742 0.75 ]
mean value: 0.7691795461150299
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.70588235 0.8125 0.6875 0.82352941 0.82352941 0.76190476
0.73333333 0.6875 0.6875 0.70588235]
mean value: 0.742906162464986
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.8125 0.6875 0.875 0.875 1.
0.73333333 0.73333333 0.73333333 0.8 ]
mean value: 0.8
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71875 0.8125 0.67708333 0.8375 0.8375 0.83333333
0.74166667 0.71041667 0.71041667 0.74375 ]
mean value: 0.7622916666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.57142857 0.68421053 0.52380952 0.73684211 0.73684211 0.76190476
0.57894737 0.55 0.55 0.6 ]
mean value: 0.6293984962406015
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10765457 0.10504031 0.1088562 0.10588574 0.10681224 0.10710359
0.11069465 0.10770369 0.10900497 0.10556507]
mean value: 0.10743210315704346
key: score_time
value: [0.01778412 0.01823831 0.0179069 0.017802 0.01791954 0.0193615
0.0178206 0.01882005 0.0178678 0.01819921]
mean value: 0.01817200183868408
key: test_mcc
value: [0.19088543 0.50395263 0.55 0.6125 0.87866878 0.29069387
0.54812195 0.67916667 0.42083333 0.61608311]
mean value: 0.5290905772900235
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.59375 0.75 0.77419355 0.80645161 0.93548387 0.64516129
0.77419355 0.83870968 0.70967742 0.80645161]
mean value: 0.7634072580645161
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.62857143 0.73333333 0.77419355 0.8125 0.93333333 0.68571429
0.75862069 0.83870968 0.70967742 0.78571429]
mean value: 0.7660368001483129
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.57894737 0.78571429 0.8 0.8125 1. 0.63157895
0.78571429 0.8125 0.6875 0.84615385]
mean value: 0.7740608733371891
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.6875 0.75 0.8125 0.875 0.75
0.73333333 0.86666667 0.73333333 0.73333333]
mean value: 0.7629166666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.59375 0.75 0.775 0.80625 0.9375 0.64166667
0.77291667 0.83958333 0.71041667 0.80416667]
mean value: 0.763125
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.45833333 0.57894737 0.63157895 0.68421053 0.875 0.52173913
0.61111111 0.72222222 0.55 0.64705882]
mean value: 0.6280201462736125
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01097584 0.01011252 0.00978446 0.0105629 0.01060104 0.01052165
0.01064515 0.01009536 0.01080298 0.01068449]
mean value: 0.010478639602661132
key: score_time
value: [0.00919628 0.00954986 0.0095408 0.00925446 0.00952196 0.00908875
0.00936627 0.0090847 0.00936127 0.00946689]
mean value: 0.009343123435974121
key: test_mcc
value: [0.12909944 0.12598816 0.10687275 0.61608311 0.35416667 0.35416667
0.29844172 0.48954403 0.09139077 0.225 ]
mean value: 0.27907533194958034
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.5625 0.5625 0.5483871 0.80645161 0.67741935 0.67741935
0.64516129 0.74193548 0.5483871 0.61290323]
mean value: 0.6383064516129032
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.61111111 0.53333333 0.5 0.82352941 0.6875 0.6875
0.56 0.75 0.46153846 0.6 ]
mean value: 0.6214512317747612
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55 0.57142857 0.58333333 0.77777778 0.6875 0.6875
0.7 0.70588235 0.54545455 0.6 ]
mean value: 0.6408876580935404
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.5 0.4375 0.875 0.6875 0.6875
0.46666667 0.8 0.4 0.6 ]
mean value: 0.6141666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5625 0.5625 0.55208333 0.80416667 0.67708333 0.67708333
0.63958333 0.74375 0.54375 0.6125 ]
mean value: 0.6375000000000001
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.44 0.36363636 0.33333333 0.7 0.52380952 0.52380952
0.38888889 0.6 0.3 0.42857143]
mean value: 0.4602049062049062
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.58
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.49366856 1.48023534 1.56707311 1.49572015 1.5158124 1.51820087
1.50707102 1.48903179 1.50217819 1.47849536]
mean value: 1.5047486782073975
key: score_time
value: [0.09201431 0.09436655 0.09239435 0.0984962 0.09627295 0.09926844
0.0995295 0.09666634 0.09262013 0.09160447]
mean value: 0.09532332420349121
key: test_mcc
value: [0.44539933 0.51639778 0.6125 0.67916667 0.67916667 0.48527095
0.68826048 0.74896053 0.48527095 0.74166667]
mean value: 0.6082060015726044
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71875 0.75 0.80645161 0.83870968 0.83870968 0.74193548
0.83870968 0.87096774 0.74193548 0.87096774]
mean value: 0.8017137096774194
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.74285714 0.71428571 0.8125 0.83870968 0.83870968 0.76470588
0.81481481 0.875 0.71428571 0.86666667]
mean value: 0.7982535290101703
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.68421053 0.83333333 0.8125 0.86666667 0.86666667 0.72222222
0.91666667 0.82352941 0.76923077 0.86666667]
mean value: 0.8161692929533487
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.625 0.8125 0.8125 0.8125 0.8125
0.73333333 0.93333333 0.66666667 0.86666667]
mean value: 0.78875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71875 0.75 0.80625 0.83958333 0.83958333 0.73958333
0.83541667 0.87291667 0.73958333 0.87083333]
mean value: 0.80125
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.59090909 0.55555556 0.68421053 0.72222222 0.72222222 0.61904762
0.6875 0.77777778 0.55555556 0.76470588]
mean value: 0.6679706451958773
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87081504 0.92136621 1.01795077 0.89831853 0.90612841 0.91984773
0.87696695 0.91162896 0.8944664 0.97775722]
mean value: 0.9195246219635009
key: score_time
value: [0.25071716 0.24670219 0.24876046 0.23476171 0.26595569 0.19230556
0.12388849 0.24812293 0.25460219 0.21983552]
mean value: 0.22856519222259522
key: test_mcc
value: [0.56360186 0.51639778 0.6778302 0.6778302 0.6125 0.54812195
0.68826048 0.74896053 0.29166667 0.68826048]
mean value: 0.6013430149607514
key: train_mcc
value: [0.91437902 0.88571429 0.91467803 0.89344886 0.9219233 0.9219233
0.90044081 0.90749278 0.90044081 0.89325701]
mean value: 0.9053698202560589
key: test_accuracy
value: [0.78125 0.75 0.83870968 0.83870968 0.80645161 0.77419355
0.83870968 0.87096774 0.64516129 0.83870968]
mean value: 0.7982862903225807
key: train_accuracy
value: [0.95714286 0.94285714 0.95729537 0.94661922 0.96085409 0.96085409
0.95017794 0.95373665 0.95017794 0.94661922]
mean value: 0.9526334519572954
key: test_fscore
value: [0.78787879 0.71428571 0.84848485 0.84848485 0.8125 0.78787879
0.81481481 0.875 0.64516129 0.81481481]
mean value: 0.7949303906965197
key: train_fscore
value: [0.95744681 0.94285714 0.95683453 0.94699647 0.96113074 0.96113074
0.95070423 0.9540636 0.95070423 0.94699647]
mean value: 0.9528864955647521
key: test_precision
value: [0.76470588 0.83333333 0.82352941 0.82352941 0.8125 0.76470588
0.91666667 0.82352941 0.625 0.91666667]
mean value: 0.8104166666666667
key: train_precision
value:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.95070423 0.94285714 0.96376812 0.93706294 0.95104895 0.95104895
0.94405594 0.95070423 0.94405594 0.94366197]
mean value: 0.947896840860711
key: test_recall
value: [0.8125 0.625 0.875 0.875 0.8125 0.8125
0.73333333 0.93333333 0.66666667 0.73333333]
mean value: 0.7879166666666666
key: train_recall
value: [0.96428571 0.94285714 0.95 0.95714286 0.97142857 0.97142857
0.95744681 0.95744681 0.95744681 0.95035461]
mean value: 0.9579837892603851
key: test_roc_auc
value: [0.78125 0.75 0.8375 0.8375 0.80625 0.77291667
0.83541667 0.87291667 0.64583333 0.83541667]
mean value: 0.7975
key: train_roc_auc
value: [0.95714286 0.94285714 0.9572695 0.94665653 0.96089159 0.96089159
0.95015198 0.9537234 0.95015198 0.94660588]
mean value: 0.9526342451874367
key: test_jcc
value: [0.65 0.55555556 0.73684211 0.73684211 0.68421053 0.65
0.6875 0.77777778 0.47619048 0.6875 ]
mean value: 0.6642418546365915
key: train_jcc
value: [0.91836735 0.89189189 0.91724138 0.89932886 0.92517007 0.92517007
0.90604027 0.91216216 0.90604027 0.89932886]
mean value: 0.9100741171391153
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02584529 0.00938845 0.00952554 0.00988483 0.00979161 0.00954032
0.01143622 0.01072431 0.01068521 0.01063943]
mean value: 0.01174612045288086
key: score_time
value: [0.01011896 0.00875497 0.00878572 0.0086236 0.00927162 0.00859261
0.01007676 0.00962615 0.00960374 0.00962329]
mean value: 0.00930774211883545
key: test_mcc
value: [0.19088543 0.438357 0.55 0.4365267 0.61608311 0.4184137
0.48527095 0.5612264 0.35983579 0.55 ]
mean value: 0.46065990788811395
key: train_mcc
value: [0.57858619 0.54486237 0.52431066 0.54736197 0.51744233 0.52431066
0.55262901 0.52460395 0.53764274 0.5252232 ]
mean value: 0.5376973074762675
key: test_accuracy
value: [0.59375 0.71875 0.77419355 0.70967742 0.80645161 0.70967742
0.74193548 0.77419355 0.67741935 0.77419355]
mean value: 0.7280241935483871
key: train_accuracy
value: [0.78928571 0.77142857 0.76156584 0.77224199 0.75800712 0.76156584
0.77580071 0.76156584 0.76868327 0.76156584]
mean value: 0.7681710726995424
key: test_fscore
value: [0.62857143 0.72727273 0.77419355 0.75675676 0.82352941 0.72727273
0.71428571 0.78787879 0.6875 0.77419355]
mean value: 0.7401454650577042
key: train_fscore
value: [0.79003559 0.78082192 0.76816609 0.78231293 0.76551724 0.76816609
0.78350515 0.77133106 0.77351916 0.77288136]
mean value: 0.7756256583831929
key: test_precision
value: [0.57894737 0.70588235 0.8 0.66666667 0.77777778 0.70588235
0.76923077 0.72222222 0.64705882 0.75 ]
mean value: 0.7123668333730253
key: train_precision
value: [0.78723404 0.75 0.74496644 0.74675325 0.74 0.74496644
0.76 0.74342105 0.76027397 0.74025974]
mean value: 0.7517874940706537
key: test_recall
value: [0.6875 0.75 0.75 0.875 0.875 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7754166666666666
key: train_recall
value: [0.79285714 0.81428571 0.79285714 0.82142857 0.79285714 0.79285714
0.80851064 0.80141844 0.78723404 0.80851064]
mean value: 0.8012816616008105
key: test_roc_auc
value: [0.59375 0.71875 0.775 0.70416667 0.80416667 0.70833333
0.73958333 0.77708333 0.67916667 0.775 ]
mean value: 0.7275
key: train_roc_auc
value: [0.78928571 0.77142857 0.7616768 0.77241641 0.7581307 0.7616768
0.77568389 0.76142351 0.76861702 0.76139818]
mean value: 0.7681737588652482
key: test_jcc
value: [0.45833333 0.57142857 0.63157895 0.60869565 0.7 0.57142857
0.55555556 0.65 0.52380952 0.63157895]
mean value: 0.5902409102466311
key: train_jcc
value: [0.65294118 0.64044944 0.62359551 0.6424581 0.62011173 0.62359551
0.6440678 0.62777778 0.63068182 0.62983425]
mean value: 0.6335513105024437
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.18638062 0.06737065 0.07680464 0.07340765 0.0783298 0.07052732
0.07229161 0.07644987 0.07185864 0.08797336]
mean value: 0.08613941669464112
key: score_time
value: [0.01080513 0.01068473 0.01079202 0.01073265 0.01104116 0.01060367
0.01058578 0.01099658 0.01063633 0.0119226 ]
mean value: 0.010880064964294434
key: test_mcc
value: [0.50395263 0.82717019 0.55 0.80833333 0.6125 0.74166667
0.6125 0.6778302 0.48333333 0.74689528]
mean value: 0.6564181640197154
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.90625 0.77419355 0.90322581 0.80645161 0.87096774
0.80645161 0.83870968 0.74193548 0.87096774]
mean value: 0.8269153225806452
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73333333 0.89655172 0.77419355 0.90322581 0.8125 0.875
0.8 0.82758621 0.73333333 0.85714286]
mean value: 0.8212866809682716
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.78571429 1. 0.8 0.93333333 0.8125 0.875
0.8 0.85714286 0.73333333 0.92307692]
mean value: 0.8520100732600733
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.8125 0.75 0.875 0.8125 0.875
0.8 0.8 0.73333333 0.8 ]
mean value: 0.7945833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.90625 0.775 0.90416667 0.80625 0.87083333
0.80625 0.8375 0.74166667 0.86875 ]
mean value: 0.8266666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.57894737 0.8125 0.63157895 0.82352941 0.68421053 0.77777778
0.66666667 0.70588235 0.57894737 0.75 ]
mean value: 0.7010040419676643
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.56
Accuracy on Blind test: 0.78
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04203153 0.05769587 0.06261158 0.06438947 0.03616428 0.06777334
0.03340292 0.05865383 0.03727388 0.05256343]
mean value: 0.05125601291656494
key: score_time
value: [0.01514578 0.02427006 0.0214963 0.01218319 0.01978111 0.01229453
0.02300239 0.01197886 0.01196885 0.02074003]
mean value: 0.017286109924316406
key: test_mcc
value: [0.38729833 0.31814238 0.54812195 0.48333333 0.61608311 0.49612132
0.55573827 0.6125 0.55573827 0.23939495]
mean value: 0.48124719335387184
key: train_mcc
value: [0.87877321 0.87877321 0.90044081 0.85801159 0.87197933 0.8647923
0.87954398 0.83632219 0.87902736 0.85798288]
mean value: 0.870564685545068
key: test_accuracy
value: [0.6875 0.65625 0.77419355 0.74193548 0.80645161 0.74193548
0.77419355 0.80645161 0.77419355 0.61290323]
mean value: 0.7376008064516129
key: train_accuracy
value: [0.93928571 0.93928571 0.95017794 0.92882562 0.93594306 0.93238434
0.93950178 0.91814947 0.93950178 0.92882562]
mean value: 0.9351881037112354
key: test_fscore
value: [0.72222222 0.62068966 0.78787879 0.75 0.82352941 0.77777778
0.74074074 0.8 0.74074074 0.64705882]
mean value: 0.7410638159826801
key: train_fscore
value: [0.93992933 0.93862816 0.94964029 0.92957746 0.93617021 0.93238434
0.94076655 0.91814947 0.93950178 0.93006993]
mean value: 0.9354817520572338
key: test_precision
value: [0.65 0.69230769 0.76470588 0.75 0.77777778 0.7
0.83333333 0.8 0.83333333 0.57894737]
mean value: 0.7380405387526131
key: train_precision
value: [0.93006993 0.94890511 0.95652174 0.91666667 0.92957746 0.92907801
0.92465753 0.92142857 0.94285714 0.91724138]
mean value: 0.9317003552171846
key: test_recall
value: [0.8125 0.5625 0.8125 0.75 0.875 0.875
0.66666667 0.8 0.66666667 0.73333333]
mean value: 0.7554166666666666
key: train_recall
value: [0.95 0.92857143 0.94285714 0.94285714 0.94285714 0.93571429
0.95744681 0.91489362 0.93617021 0.94326241]
mean value: 0.9394630192502533
key: test_roc_auc
value: [0.6875 0.65625 0.77291667 0.74166667 0.80416667 0.7375
0.77083333 0.80625 0.77083333 0.61666667]
mean value: 0.7364583333333333
key: train_roc_auc
value: [0.93928571 0.93928571 0.95015198 0.92887538 0.93596758 0.93239615
0.93943769 0.91816109 0.93951368 0.92877406]
mean value: 0.9351849037487335
key: test_jcc
value: [0.56521739 0.45 0.65 0.6 0.7 0.63636364
0.58823529 0.66666667 0.58823529 0.47826087]
mean value: 0.5922979152135163
key: train_jcc
value: [0.88666667 0.88435374 0.90410959 0.86842105 0.88 0.87333333
0.88815789 0.84868421 0.88590604 0.86928105]
mean value: 0.8788913574452522
MCC on Blind test: 0.36
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01258373 0.0108614 0.00933671 0.0089016 0.00908589 0.00921488
0.00913262 0.00914311 0.00911927 0.00905776]
mean value: 0.00964369773864746
key: score_time
value: [0.01155806 0.00890446 0.00873089 0.00843573 0.00856686 0.00853968
0.00856066 0.00847149 0.0085392 0.0085187 ]
mean value: 0.008882570266723632
key: test_mcc
value: [ 0.38729833 -0.06262243 0.48333333 0.29069387 0.57461167 0.35416667
0.76594169 0.58316015 0.67916667 0.4184137 ]
mean value: 0.4474163646749253
key: train_mcc
value: [0.5161854 0.50871556 0.50376414 0.46836906 0.48948125 0.56024518
0.51919225 0.52460395 0.47487913 0.56002251]
mean value: 0.5125458434467222
key: test_accuracy
value: [0.6875 0.46875 0.74193548 0.64516129 0.77419355 0.67741935
0.87096774 0.77419355 0.83870968 0.70967742]
mean value: 0.7188508064516129
key: train_accuracy
value: [0.75714286 0.75357143 0.75088968 0.73309609 0.74377224 0.77935943
0.75800712 0.76156584 0.7366548 0.77935943]
mean value: 0.7553418912048805
key: test_fscore
value: [0.72222222 0.4516129 0.75 0.68571429 0.81081081 0.6875
0.84615385 0.8 0.83870968 0.68965517]
mean value: 0.728237891796012
key: train_fscore
value: [0.76712329 0.7628866 0.76027397 0.7440273 0.75342466 0.7862069
0.77181208 0.77133106 0.74829932 0.78767123]
mean value: 0.7653056407214348
key: test_precision
value: [0.65 0.46666667 0.75 0.63157895 0.71428571 0.6875
1. 0.7 0.8125 0.71428571]
mean value: 0.7126817042606516
key: train_precision
value: [0.73684211 0.73509934 0.73026316 0.7124183 0.72368421 0.76
0.73248408 0.74342105 0.71895425 0.7615894 ]
mean value: 0.7354755893490372
key: test_recall
value: [0.8125 0.4375 0.75 0.75 0.9375 0.6875
0.73333333 0.93333333 0.86666667 0.66666667]
mean value: 0.7575
key: train_recall
value: [0.8 0.79285714 0.79285714 0.77857143 0.78571429 0.81428571
0.81560284 0.80141844 0.78014184 0.81560284]
mean value: 0.7977051671732522
key: test_roc_auc
value: [0.6875 0.46875 0.74166667 0.64166667 0.76875 0.67708333
0.86666667 0.77916667 0.83958333 0.70833333]
mean value: 0.7179166666666666
key: train_roc_auc
value: [0.75714286 0.75357143 0.7510385 0.73325735 0.74392097 0.77948328
0.75780142 0.76142351 0.73649949 0.77922999]
mean value: 0.7553368794326241
key: test_jcc
value: [0.56521739 0.29166667 0.6 0.52173913 0.68181818 0.52380952
0.73333333 0.66666667 0.72222222 0.52631579]
mean value: 0.5832788905729409
key: train_jcc
value: [0.62222222 0.61666667 0.61325967 0.5923913 0.6043956 0.64772727
0.6284153 0.62777778 0.59782609 0.64971751]
mean value: 0.620039941827292
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01375198 0.01595616 0.01475716 0.01581788 0.01899552 0.01607966
0.01525974 0.01696014 0.01642799 0.01958275]
mean value: 0.01635890007019043
key: score_time
value: [0.00951004 0.01106167 0.01103091 0.01145887 0.01153207 0.01150727
0.01147628 0.0115335 0.01149893 0.01153302]
mean value: 0.011214256286621094
key: test_mcc
value: [0.56360186 0.37796447 0.6778302 0.6681531 0.61925228 0.55
0.49612132 0.71807033 0.4184137 0.37191715]
mean value: 0.5461324430638016
key: train_mcc
value: [0.66793226 0.78643686 0.73077387 0.58944915 0.79807813 0.75269046
0.64551514 0.70167408 0.75139391 0.7486327 ]
mean value: 0.7172576566417463
key: test_accuracy
value: [0.78125 0.6875 0.83870968 0.80645161 0.80645161 0.77419355
0.74193548 0.83870968 0.70967742 0.67741935]
mean value: 0.7662298387096774
key: train_accuracy
value: [0.82142857 0.89285714 0.86476868 0.76156584 0.89679715 0.8683274
0.80782918 0.84697509 0.87544484 0.86120996]
mean value: 0.8497203863751907
key: test_fscore
value: [0.77419355 0.66666667 0.84848485 0.76923077 0.8 0.77419355
0.69230769 0.85714286 0.68965517 0.70588235]
mean value: 0.7577757455961998
key: train_fscore
value: [0.79338843 0.89051095 0.86805556 0.68837209 0.89056604 0.85258964
0.775 0.85808581 0.87364621 0.87774295]
mean value: 0.8367957671081703
key: test_precision
value: [0.8 0.71428571 0.82352941 1. 0.85714286 0.8
0.81818182 0.75 0.71428571 0.63157895]
mean value: 0.7909004463029231
key: train_precision
value: [0.94117647 0.91044776 0.84459459 0.98666667 0.944 0.96396396
0.93939394 0.80246914 0.88970588 0.78651685]
mean value: 0.9008935268489424
key: test_recall
value: [0.75 0.625 0.875 0.625 0.75 0.75
0.6 1. 0.66666667 0.8 ]
mean value: 0.7441666666666666
key: train_recall
value: [0.68571429 0.87142857 0.89285714 0.52857143 0.84285714 0.76428571
0.65957447 0.92198582 0.85815603 0.9929078 ]
mean value: 0.8018338399189463
key: test_roc_auc
value: [0.78125 0.6875 0.8375 0.8125 0.80833333 0.775
0.7375 0.84375 0.70833333 0.68125 ]
mean value: 0.7672916666666667
key: train_roc_auc
value: [0.82142857 0.89285714 0.86486829 0.76073961 0.89660588 0.86795846
0.80835866 0.84670719 0.87550659 0.86073961]
mean value: 0.8495770010131712
key: test_jcc
value: [0.63157895 0.5 0.73684211 0.625 0.66666667 0.63157895
0.52941176 0.75 0.52631579 0.54545455]
mean value: 0.6142848766300778
key: train_jcc
value: [0.65753425 0.80263158 0.76687117 0.5248227 0.80272109 0.74305556
0.63265306 0.75144509 0.77564103 0.78212291]
mean value: 0.7239498408791925
MCC on Blind test: 0.43
Accuracy on Blind test: 0.69
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02493834 0.01682949 0.01681733 0.01732183 0.01644254 0.01673317
0.01562119 0.0168426 0.01635194 0.01678419]
mean value: 0.01746826171875
key: score_time
value: [0.01164889 0.0114696 0.01156235 0.01149487 0.01146197 0.01149297
0.01148677 0.01147938 0.01151204 0.01378536]
mean value: 0.011739420890808105
key: test_mcc
value: [0.25819889 0.31311215 0.37191715 0.60910959 0.6778302 0.4184137
0.55777335 0.57104024 0.24910095 0.37191715]
mean value: 0.43984133753776633
key: train_mcc
value: [0.68877552 0.80732823 0.70217171 0.64971048 0.76419699 0.70901046
0.62520923 0.56207051 0.42657096 0.73440191]
mean value: 0.6669446002434639
key: test_accuracy
value: [0.625 0.65625 0.67741935 0.77419355 0.83870968 0.70967742
0.74193548 0.74193548 0.58064516 0.67741935]
mean value: 0.7023185483870967
key: train_accuracy
value: [0.83214286 0.90357143 0.83274021 0.80427046 0.87900356 0.85409253
0.78647687 0.74733096 0.65480427 0.85765125]
mean value: 0.8152084392475851
key: test_fscore
value: [0.57142857 0.66666667 0.64285714 0.82051282 0.84848485 0.72727273
0.63636364 0.78947368 0.68292683 0.70588235]
mean value: 0.7091869280006409
key: train_fscore
value: [0.80658436 0.90459364 0.8 0.83282675 0.87022901 0.84981685
0.73451327 0.7965616 0.74406332 0.87261146]
mean value: 0.8211800275313914
key: test_precision
value: [0.66666667 0.64705882 0.75 0.69565217 0.82352941 0.70588235
1. 0.65217391 0.53846154 0.63157895]
mean value: 0.7111003827688442
key: train_precision
value: [0.95145631 0.8951049 0.98947368 0.72486772 0.93442623 0.87218045
0.97647059 0.66826923 0.59243697 0.79190751]
mean value: 0.8396593603744082
key: test_recall
value: [0.5 0.6875 0.5625 1. 0.875 0.75
0.46666667 1. 0.93333333 0.8 ]
mean value: 0.7575000000000001
key: train_recall
value: [0.7 0.91428571 0.67142857 0.97857143 0.81428571 0.82857143
0.58865248 0.9858156 1. 0.97163121]
mean value: 0.8453242147922999
key: test_roc_auc
value: [0.625 0.65625 0.68125 0.76666667 0.8375 0.70833333
0.73333333 0.75 0.59166667 0.68125 ]
mean value: 0.703125
key: train_roc_auc
value: [0.83214286 0.90357143 0.83216819 0.80488855 0.87877406 0.85400203
0.78718338 0.74647923 0.65357143 0.85724417]
mean value: 0.8150025329280648
key: test_jcc
value: [0.4 0.5 0.47368421 0.69565217 0.73684211 0.57142857
0.46666667 0.65217391 0.51851852 0.54545455]
mean value: 0.5560420704814297
key: train_jcc
value: [0.67586207 0.82580645 0.66666667 0.71354167 0.77027027 0.7388535
0.58041958 0.66190476 0.59243697 0.7740113 ]
mean value: 0.6999773243916024
MCC on Blind test: 0.38
Accuracy on Blind test: 0.66
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.16071701 0.13766336 0.13898635 0.13800216 0.13920808 0.13892508
0.1380024 0.13671398 0.1370995 0.13823891]
mean value: 0.14035568237304688
key: score_time
value: [0.01500583 0.01504397 0.0150857 0.01502109 0.015064 0.01523542
0.01508236 0.01495957 0.0150423 0.01509142]
mean value: 0.015063166618347168
key: test_mcc
value: [0.56360186 0.44539933 0.61608311 0.67916667 0.55 0.74166667
0.54812195 0.54812195 0.48333333 0.61608311]
mean value: 0.5791577998148137
key: train_mcc
value: [0.99288247 0.97859639 0.9929078 0.97867167 0.9929078 0.9929078
1. 1. 0.99290744 1. ]
mean value: 0.9921781379034591
key: test_accuracy
value: [0.78125 0.71875 0.80645161 0.83870968 0.77419355 0.87096774
0.77419355 0.77419355 0.74193548 0.80645161]
mean value: 0.7887096774193548
key: train_accuracy
value: [0.99642857 0.98928571 0.99644128 0.98932384 0.99644128 0.99644128
1. 1. 0.99644128 1. ]
mean value: 0.9960803253685816
key: test_fscore
value: [0.78787879 0.68965517 0.82352941 0.83870968 0.77419355 0.875
0.75862069 0.75862069 0.73333333 0.78571429]
mean value: 0.7825255596221703
key: train_fscore
value: [0.99641577 0.98924731 0.99644128 0.98924731 0.99644128 0.99644128
1. 1. 0.99646643 1. ]
mean value: 0.9960700668777009
key: test_precision
value: [0.76470588 0.76923077 0.77777778 0.86666667 0.8 0.875
0.78571429 0.78571429 0.73333333 0.84615385]
mean value: 0.8004296846943906
key: train_precision
value: [1. 0.99280576 0.9929078 0.99280576 0.9929078 0.9929078
1. 1. 0.99295775 1. ]
mean value: 0.995729266152556
key: test_recall
value: [0.8125 0.625 0.875 0.8125 0.75 0.875
0.73333333 0.73333333 0.73333333 0.73333333]
mean value: 0.7683333333333333
key: train_recall
value: [0.99285714 0.98571429 1. 0.98571429 1. 1.
1. 1. 1. 1. ]
mean value: 0.9964285714285714
key: test_roc_auc
value: [0.78125 0.71875 0.80416667 0.83958333 0.775 0.87083333
0.77291667 0.77291667 0.74166667 0.80416667]
mean value: 0.788125
key: train_roc_auc
value: [0.99642857 0.98928571 0.9964539 0.98931104 0.9964539 0.9964539
1. 1. 0.99642857 1. ]
mean value: 0.9960815602836879
key: test_jcc
value: [0.65 0.52631579 0.7 0.72222222 0.63157895 0.77777778
0.61111111 0.61111111 0.57894737 0.64705882]
mean value: 0.6456123151014792
key: train_jcc
value: [0.99285714 0.9787234 0.9929078 0.9787234 0.9929078 0.9929078
1. 1. 0.99295775 1. ]
mean value: 0.9921985102101973
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05436349 0.06020713 0.06400919 0.07298946 0.06242132 0.05749798
0.06266236 0.07693434 0.07352424 0.07245946]
mean value: 0.06570689678192139
key: score_time
value: [0.02195573 0.02486134 0.03126049 0.02224779 0.0243659 0.02393746
0.02083087 0.02240777 0.04192734 0.01869297]
mean value: 0.02524876594543457
key: test_mcc
value: [0.56360186 0.72374686 0.42083333 0.87866878 0.67916667 0.6778302
0.48527095 0.48333333 0.48333333 0.55573827]
mean value: 0.5951523594158408
key: train_mcc
value: [0.97879618 0.97182532 0.9716269 0.95738969 0.95767878 0.9716269
0.9716269 0.97162977 0.965028 0.97887218]
mean value: 0.9696100628980616
key: test_accuracy
value: [0.78125 0.84375 0.70967742 0.93548387 0.83870968 0.83870968
0.74193548 0.74193548 0.74193548 0.77419355]
mean value: 0.7947580645161291
key: train_accuracy
value: [0.98928571 0.98571429 0.98576512 0.97864769 0.97864769 0.98576512
0.98576512 0.98576512 0.98220641 0.98932384]
mean value: 0.9846886120996441
key: test_fscore
value: [0.77419355 0.81481481 0.70967742 0.93333333 0.83870968 0.84848485
0.71428571 0.73333333 0.73333333 0.74074074]
mean value: 0.7840906763487409
key: train_fscore
value: [0.98916968 0.98550725 0.98561151 0.97841727 0.97826087 0.98561151
0.98591549 0.98571429 0.98194946 0.98924731]
mean value: 0.984540462778581
key: test_precision
value: [0.8 1. 0.73333333 1. 0.86666667 0.82352941
0.76923077 0.73333333 0.73333333 0.83333333]
mean value: 0.8292760180995475
key: train_precision
value: [1. 1. 0.99275362 0.98550725 0.99264706 0.99275362
0.97902098 0.99280576 1. 1. ]
mean value: 0.9935488285993815
key: test_recall
value: [0.75 0.6875 0.6875 0.875 0.8125 0.875
0.66666667 0.73333333 0.73333333 0.66666667]
mean value: 0.74875
key: train_recall
value: [0.97857143 0.97142857 0.97857143 0.97142857 0.96428571 0.97857143
0.9929078 0.9787234 0.96453901 0.9787234 ]
mean value: 0.975775075987842
key: test_roc_auc
value: [0.78125 0.84375 0.71041667 0.9375 0.83958333 0.8375
0.73958333 0.74166667 0.74166667 0.77083333]
mean value: 0.794375
key: train_roc_auc
value: [0.98928571 0.98571429 0.98573961 0.97862209 0.97859676 0.98573961
0.98573961 0.98579027 0.9822695 0.9893617 ]
mean value: 0.9846859169199594
key: test_jcc
value: [0.63157895 0.6875 0.55 0.875 0.72222222 0.73684211
0.55555556 0.57894737 0.57894737 0.58823529]
mean value: 0.650482886136911
key: train_jcc
value: [0.97857143 0.97142857 0.97163121 0.95774648 0.95744681 0.97163121
0.97222222 0.97183099 0.96453901 0.9787234 ]
mean value: 0.9695771318216628
MCC on Blind test: 0.51
Accuracy on Blind test: 0.75
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.05900478 0.05510592 0.04103112 0.07902026 0.08206344 0.0536263
0.03792715 0.04140687 0.09598207 0.07943511]
mean value: 0.06246030330657959
key: score_time
value: [0.02397704 0.0130322 0.0130856 0.02284026 0.02174592 0.0130434
0.01309609 0.01705003 0.02606058 0.02308249]
mean value: 0.01870136260986328
key: test_mcc
value: [0.19088543 0.25 0.48333333 0.48954403 0.43041423 0.22630095
0.29960206 0.80833333 0.42083333 0.42083333]
mean value: 0.40200800428633815
key: train_mcc
value: [0.98571429 0.98571429 0.99290744 0.98576494 0.9929078 0.98576494
0.99290744 0.98576494 0.99290744 0.98576494]
mean value: 0.9886118480130845
key: test_accuracy
value: [0.59375 0.625 0.74193548 0.74193548 0.70967742 0.61290323
0.64516129 0.90322581 0.70967742 0.70967742]
mean value: 0.6992943548387097
key: train_accuracy
value: [0.99285714 0.99285714 0.99644128 0.99288256 0.99644128 0.99288256
0.99644128 0.99288256 0.99644128 0.99288256]
mean value: 0.9943009659379767
key: test_fscore
value: [0.62857143 0.625 0.75 0.73333333 0.68965517 0.66666667
0.66666667 0.90322581 0.70967742 0.70967742]
mean value: 0.7082473912813179
key: train_fscore
value: [0.99285714 0.99285714 0.99641577 0.99285714 0.99644128 0.99285714
0.99646643 0.9929078 0.99646643 0.9929078 ]
mean value: 0.9943034088204372
key: test_precision
value: [0.57894737 0.625 0.75 0.78571429 0.76923077 0.6
0.61111111 0.875 0.6875 0.6875 ]
mean value: 0.6970003534477218
key: train_precision
value: [0.99285714 0.99285714 1. 0.99285714 0.9929078 0.99285714
0.99295775 0.9929078 0.99295775 0.9929078 ]
mean value: 0.9936067468641637
key: test_recall
value: [0.6875 0.625 0.75 0.6875 0.625 0.75
0.73333333 0.93333333 0.73333333 0.73333333]
mean value: 0.7258333333333333
key: train_recall
value: [0.99285714 0.99285714 0.99285714 0.99285714 1. 0.99285714
1. 0.9929078 1. 0.9929078 ]
mean value: 0.9950101317122594
key: test_roc_auc
value: [0.59375 0.625 0.74166667 0.74375 0.7125 0.60833333
0.64791667 0.90416667 0.71041667 0.71041667]
mean value: 0.6997916666666667
key: train_roc_auc
value: [0.99285714 0.99285714 0.99642857 0.99288247 0.9964539 0.99288247
0.99642857 0.99288247 0.99642857 0.99288247]
mean value: 0.9942983789260386
key: test_jcc
value: [0.45833333 0.45454545 0.6 0.57894737 0.52631579 0.5
0.5 0.82352941 0.55 0.55 ]
mean value: 0.554167135753823
key: train_jcc
value: [0.9858156 0.9858156 0.99285714 0.9858156 0.9929078 0.9858156
0.99295775 0.98591549 0.99295775 0.98591549]
mean value: 0.988677383449634
MCC on Blind test: 0.25
Accuracy on Blind test: 0.63
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.53291035 0.51292086 0.51562953 0.50803471 0.51853633 0.51748157
0.52107215 0.5132122 0.51350141 0.51845551]
mean value: 0.5171754598617554
key: score_time
value: [0.01048017 0.00943899 0.00925374 0.00941324 0.00959206 0.01012707
0.01015234 0.0094924 0.00954914 0.0099051 ]
mean value: 0.009740424156188966
key: test_mcc
value: [0.625 0.69991324 0.6125 0.74166667 0.6125 0.6778302
0.55573827 0.54812195 0.48333333 0.61608311]
mean value: 0.6172686782122512
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.84375 0.80645161 0.87096774 0.80645161 0.83870968
0.77419355 0.77419355 0.74193548 0.80645161]
mean value: 0.8075604838709677
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8125 0.82758621 0.8125 0.875 0.8125 0.84848485
0.74074074 0.75862069 0.73333333 0.78571429]
mean value: 0.8006980104824932
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8125 0.92307692 0.8125 0.875 0.8125 0.82352941
0.83333333 0.78571429 0.73333333 0.84615385]
mean value: 0.8257641133376428
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.75 0.8125 0.875 0.8125 0.875
0.66666667 0.73333333 0.73333333 0.73333333]
mean value: 0.7804166666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.84375 0.80625 0.87083333 0.80625 0.8375
0.77083333 0.77291667 0.74166667 0.80416667]
mean value: 0.8066666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.68421053 0.70588235 0.68421053 0.77777778 0.68421053 0.73684211
0.58823529 0.61111111 0.57894737 0.64705882]
mean value: 0.6698486412108703
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.76
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04052234 0.02345443 0.02418399 0.02380633 0.02393818 0.02422571
0.02493715 0.02415442 0.02463698 0.02459621]
mean value: 0.0258455753326416
key: score_time
value: [0.01229191 0.01268578 0.01460195 0.01458263 0.01477909 0.01717997
0.01701856 0.0148387 0.01225853 0.01511025]
mean value: 0.014534735679626464
key: test_mcc
value: [0.26967994 0.31814238 0.42083333 0.42083333 0.35445878 0.46159086
0.48954403 0.69203857 0.23012754 0.19266866]
mean value: 0.38499174308021394
key: train_mcc
value: [0.89802651 0.78050971 0.93120324 0.85402471 0.72348814 0.95136724
0.76926429 0.83529602 0.74016312 0.78697838]
mean value: 0.8270321372218536
key: test_accuracy
value: [0.625 0.65625 0.70967742 0.70967742 0.67741935 0.70967742
0.74193548 0.83870968 0.61290323 0.58064516]
mean value: 0.6861895161290322
key: train_accuracy
value: [0.94642857 0.87857143 0.96441281 0.92170819 0.84341637 0.97508897
0.87188612 0.91103203 0.85409253 0.88256228]
mean value: 0.9049199288256228
key: test_fscore
value: [0.68421053 0.68571429 0.70967742 0.70967742 0.70588235 0.76923077
0.75 0.84848485 0.625 0.64864865]
mean value: 0.7136526270045196
key: train_fscore
value: [0.94915254 0.89171975 0.96551724 0.92715232 0.86419753 0.97560976
0.88679245 0.91856678 0.87306502 0.8952381 ]
mean value: 0.9147011472610135
key: test_precision
value: [0.59090909 0.63157895 0.73333333 0.73333333 0.66666667 0.65217391
0.70588235 0.77777778 0.58823529 0.54545455]
mean value: 0.662534525494547
key: train_precision
value: [0.90322581 0.8045977 0.93333333 0.86419753 0.76086957 0.95238095
0.79661017 0.84939759 0.77472527 0.81034483]
mean value: 0.8449682751561366
key: test_recall
value: [0.8125 0.75 0.6875 0.6875 0.75 0.9375
0.8 0.93333333 0.66666667 0.8 ]
mean value: 0.7825
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.625 0.65625 0.71041667 0.71041667 0.675 0.70208333
0.74375 0.84166667 0.61458333 0.5875 ]
mean value: 0.6866666666666666
key: train_roc_auc
value: [0.94642857 0.87857143 0.96453901 0.92198582 0.84397163 0.9751773
0.87142857 0.91071429 0.85357143 0.88214286]
mean value: 0.9048530901722391
key: test_jcc
value: [0.52 0.52173913 0.55 0.55 0.54545455 0.625
0.6 0.73684211 0.45454545 0.48 ]
mean value: 0.5583581235697941
key: train_jcc
value: [0.90322581 0.8045977 0.93333333 0.86419753 0.76086957 0.95238095
0.79661017 0.84939759 0.77472527 0.81034483]
mean value: 0.8449682751561366
MCC on Blind test: 0.16
Accuracy on Blind test: 0.59
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02280617 0.01443529 0.01433778 0.01460624 0.01442766 0.03603244
0.03605747 0.03566337 0.03552485 0.03572798]
mean value: 0.025961923599243163
key: score_time
value: [0.01775074 0.01187015 0.01184273 0.01180935 0.0117538 0.02118683
0.02436876 0.02252245 0.02214885 0.02132249]
mean value: 0.01765761375427246
key: test_mcc
value: [0.31311215 0.37796447 0.54812195 0.6125 0.6778302 0.4184137
0.61608311 0.82285074 0.48527095 0.48333333]
mean value: 0.5355480607549098
key: train_mcc
value: [0.83590622 0.82890983 0.80785208 0.78681467 0.79361702 0.82917933
0.80080045 0.78647416 0.83680633 0.77951762]
mean value: 0.808587771026312
key: test_accuracy
value: [0.65625 0.6875 0.77419355 0.80645161 0.83870968 0.70967742
0.80645161 0.90322581 0.74193548 0.74193548]
mean value: 0.766633064516129
key: train_accuracy
value: [0.91785714 0.91428571 0.90391459 0.89323843 0.89679715 0.91459075
0.90035587 0.89323843 0.91814947 0.88967972]
mean value: 0.9042107269954245
key: test_fscore
value: [0.66666667 0.66666667 0.78787879 0.8125 0.84848485 0.72727273
0.78571429 0.90909091 0.71428571 0.73333333]
mean value: 0.7651893939393939
key: train_fscore
value: [0.91872792 0.91304348 0.90391459 0.8943662 0.89679715 0.91428571
0.9 0.89361702 0.91986063 0.89122807]
mean value: 0.9045840767326006
key: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
test_precision
value: [0.64705882 0.71428571 0.76470588 0.8125 0.82352941 0.70588235
0.84615385 0.83333333 0.76923077 0.73333333]
mean value: 0.7650013466925232
key: train_precision
value: [0.90909091 0.92647059 0.90070922 0.88194444 0.89361702 0.91428571
0.90647482 0.89361702 0.90410959 0.88194444]
mean value: 0.9012263772097134
key: test_recall
value: [0.6875 0.625 0.8125 0.8125 0.875 0.75
0.73333333 1. 0.66666667 0.73333333]
mean value: 0.7695833333333333
key: train_recall
value: [0.92857143 0.9 0.90714286 0.90714286 0.9 0.91428571
0.89361702 0.89361702 0.93617021 0.90070922]
mean value: 0.9081256332320162
key: test_roc_auc
value: [0.65625 0.6875 0.77291667 0.80625 0.8375 0.70833333
0.80416667 0.90625 0.73958333 0.74166667]
mean value: 0.7660416666666667
key: train_roc_auc
value: [0.91785714 0.91428571 0.90392604 0.89328774 0.89680851 0.91458967
0.90037994 0.89323708 0.91808511 0.88964032]
mean value: 0.904209726443769
key: test_jcc
value: [0.5 0.5 0.65 0.68421053 0.73684211 0.57142857
0.64705882 0.83333333 0.55555556 0.57894737]
mean value: 0.6257376283846872
key: train_jcc
value: [0.8496732 0.84 0.82467532 0.8089172 0.81290323 0.84210526
0.81818182 0.80769231 0.8516129 0.80379747]
mean value: 0.8259558711160642
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25096083 0.24789667 0.25337172 0.24919391 0.26491046 0.25964808
0.25687599 0.33584547 0.25059628 0.25279713]
mean value: 0.2622096538543701
key: score_time
value: [0.0240047 0.02082849 0.01891971 0.02080178 0.02297473 0.02297497
0.02118254 0.02063465 0.02154756 0.02372384]
mean value: 0.021759295463562013
key: test_mcc
value: [0.38729833 0.37796447 0.54812195 0.6125 0.6778302 0.48527095
0.61608311 0.69203857 0.48333333 0.48527095]
mean value: 0.5365711870728687
key: train_mcc
value: [0.68683657 0.82890983 0.68713898 0.78681467 0.68682877 0.69395613
0.65160982 0.65141613 0.70131788 0.72326575]
mean value: 0.7098094526353244
key: test_accuracy
value: [0.6875 0.6875 0.77419355 0.80645161 0.83870968 0.74193548
0.80645161 0.83870968 0.74193548 0.74193548]
mean value: 0.7665322580645161
key: train_accuracy
value: [0.84285714 0.91428571 0.84341637 0.89323843 0.84341637 0.84697509
0.82562278 0.82562278 0.85053381 0.86120996]
mean value: 0.8547178444331469
key: test_fscore
value: [0.72222222 0.66666667 0.78787879 0.8125 0.84848485 0.76470588
0.78571429 0.84848485 0.73333333 0.71428571]
mean value: 0.7684276589423649
key: train_fscore
value: [0.84722222 0.91304348 0.84507042 0.8943662 0.84285714 0.84587814
0.82926829 0.82437276 0.85314685 0.8650519 ]
mean value: 0.8560277408059859
key: test_precision
value: [0.65 0.71428571 0.76470588 0.8125 0.82352941 0.72222222
0.84615385 0.77777778 0.73333333 0.76923077]
mean value: 0.761373895712131
key: train_precision
value: [0.82432432 0.92647059 0.83333333 0.88194444 0.84285714 0.84892086
0.81506849 0.83333333 0.84137931 0.84459459]
mean value: 0.8492226427927332
key: test_recall
value: [0.8125 0.625 0.8125 0.8125 0.875 0.8125
0.73333333 0.93333333 0.73333333 0.66666667]
mean value: 0.7816666666666666
key: train_recall
value: [0.87142857 0.9 0.85714286 0.90714286 0.84285714 0.84285714
0.84397163 0.81560284 0.86524823 0.88652482]
mean value: 0.8632776089159068
key: test_roc_auc
value: [0.6875 0.6875 0.77291667 0.80625 0.8375 0.73958333
0.80416667 0.84166667 0.74166667 0.73958333]
mean value: 0.7658333333333334
key: train_roc_auc
value: [0.84285714 0.91428571 0.84346505 0.89328774 0.84341439 0.84696049
0.82555724 0.82565856 0.85048126 0.86111955]
mean value: 0.854708713272543
key: test_jcc
value: [0.56521739 0.5 0.65 0.68421053 0.73684211 0.61904762
0.64705882 0.73684211 0.57894737 0.55555556]
mean value: 0.6273721494700092
key: train_jcc
value: [0.73493976 0.84 0.73170732 0.8089172 0.72839506 0.73291925
0.70833333 0.70121951 0.74390244 0.76219512]
mean value: 0.749252899645239
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02953863 0.03505015 0.03531361 0.03359294 0.0331347 0.03402448
0.03351879 0.03452086 0.04119611 0.02893472]
mean value: 0.0338824987411499
key: score_time
value: [0.01170063 0.0165534 0.01207328 0.01180673 0.0122366 0.01206207
0.01204228 0.01204181 0.01195955 0.01177216]
mean value: 0.012424850463867187
key: test_mcc
value: [0.438357 0.438357 0.48333333 0.6778302 0.48527095 0.48333333
0.68826048 0.69203857 0.48527095 0.42083333]
mean value: 0.5292885147430821
key: train_mcc
value: [0.72144698 0.72864578 0.75089924 0.70836501 0.70111973 0.71556015
0.70106383 0.70836501 0.71535695 0.75089924]
mean value: 0.7201721914786785
key: test_accuracy
value: [0.71875 0.71875 0.74193548 0.83870968 0.74193548 0.74193548
0.83870968 0.83870968 0.74193548 0.70967742]
mean value: 0.7631048387096775
key: train_accuracy
value: [0.86071429 0.86428571 0.87544484 0.85409253 0.85053381 0.85765125
0.85053381 0.85409253 0.85765125 0.87544484]
mean value: 0.8600444839857652
key: test_fscore
value: [0.72727273 0.70967742 0.75 0.84848485 0.76470588 0.75
0.81481481 0.84848485 0.71428571 0.70967742]
mean value: 0.7637403674405572
key: train_fscore
value: [0.86120996 0.86524823 0.87455197 0.85512367 0.84892086 0.85507246
0.85106383 0.85304659 0.85915493 0.87632509]
mean value: 0.859971760736446
key: test_precision
value: [0.70588235 0.73333333 0.75 0.82352941 0.72222222 0.75
0.91666667 0.77777778 0.76923077 0.6875 ]
mean value: 0.7636142533936652
key: train_precision
value: [0.85815603 0.85915493 0.87769784 0.84615385 0.85507246 0.86764706
0.85106383 0.86231884 0.85314685 0.87323944]
mean value: 0.8603651128551885
key: test_recall
value: [0.75 0.6875 0.75 0.875 0.8125 0.75
0.73333333 0.93333333 0.66666667 0.73333333]
mean value: 0.7691666666666667
key: train_recall
value: [0.86428571 0.87142857 0.87142857 0.86428571 0.84285714 0.84285714
0.85106383 0.84397163 0.86524823 0.87943262]
mean value: 0.8596859169199595
key: test_roc_auc
value: [0.71875 0.71875 0.74166667 0.8375 0.73958333 0.74166667
0.83541667 0.84166667 0.73958333 0.71041667]
mean value: 0.7625
key: train_roc_auc
value: [0.86071429 0.86428571 0.8754306 0.85412867 0.85050659 0.85759878
0.85053191 0.85412867 0.85762411 0.8754306 ]
mean value: 0.8600379939209727
key: test_jcc
value: [0.57142857 0.55 0.6 0.73684211 0.61904762 0.6
0.6875 0.73684211 0.55555556 0.55 ]
mean value: 0.6207215956558062
key: train_jcc
value: [0.75625 0.7625 0.77707006 0.74691358 0.7375 0.74683544
0.74074074 0.74375 0.75308642 0.77987421]
mean value: 0.7544520461309461
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.92847991 0.77967024 0.77496982 0.87230301 0.79569674 0.93908501
0.77330136 0.76536059 0.965801 0.76891041]
mean value: 0.8363578081130981
key: score_time
value: [0.01461387 0.01185727 0.01436329 0.01196814 0.01442528 0.01206875
0.01191068 0.01198006 0.01440215 0.01197433]
mean value: 0.012956380844116211
key: test_mcc
value: [0.12909944 0.37796447 0.4184137 0.74689528 0.48527095 0.4184137
0.74689528 0.63696156 0.46159086 0.48333333]
mean value: 0.4904838594649526
key: train_mcc
value: [0.89342711 0.64390928 0.80070922 0.63786232 0.77937079 0.67973658
0.57566395 0.63712378 0.9929078 0.77938197]
mean value: 0.7420092792860252
key: test_accuracy
value: [0.5625 0.6875 0.70967742 0.87096774 0.74193548 0.70967742
0.87096774 0.80645161 0.70967742 0.74193548]
mean value: 0.7411290322580645
key: train_accuracy
value: [0.94642857 0.82142857 0.90035587 0.81850534 0.88967972 0.83985765
0.78647687 0.81850534 0.99644128 0.88967972]
mean value: 0.8707358922216574
key: test_fscore
value: [0.61111111 0.66666667 0.72727273 0.88235294 0.76470588 0.72727273
0.85714286 0.82352941 0.60869565 0.73333333]
mean value: 0.7402083310267453
key: train_fscore
value: [0.94736842 0.82638889 0.9 0.82229965 0.88888889 0.83985765
0.7972973 0.82105263 0.99644128 0.88967972]
mean value: 0.8729274426961431
key: test_precision
value: [0.55 0.71428571 0.70588235 0.83333333 0.72222222 0.70588235
0.92307692 0.73684211 0.875 0.73333333]
mean value: 0.7499858337397037
key: train_precision
value: [0.93103448 0.80405405 0.9 0.80272109 0.89208633 0.83687943
0.76129032 0.8125 1. 0.89285714]
mean value: 0.8633422854245202
key: test_recall
value: [0.6875 0.625 0.75 0.9375 0.8125 0.75
0.8 0.93333333 0.46666667 0.73333333]
mean value: 0.7495833333333334
key: train_recall
value: [0.96428571 0.85 0.9 0.84285714 0.88571429 0.84285714
0.83687943 0.82978723 0.9929078 0.88652482]
mean value: 0.8831813576494427
key: test_roc_auc
value: [0.5625 0.6875 0.70833333 0.86875 0.73958333 0.70833333
0.86875 0.81041667 0.70208333 0.74166667]
mean value: 0.7397916666666666
key: train_roc_auc
value: [0.94642857 0.82142857 0.90035461 0.81859169 0.88966565 0.83986829
0.78629686 0.81846505 0.9964539 0.88969098]
mean value: 0.8707244174265452
key: test_jcc
value: [0.44 0.5 0.57142857 0.78947368 0.61904762 0.57142857
0.75 0.7 0.4375 0.57894737]
mean value: 0.595782581453634
key: train_jcc
value: [0.9 0.70414201 0.81818182 0.69822485 0.8 0.72392638
0.66292135 0.69642857 0.9929078 0.80128205]
mean value: 0.7798014834898911
MCC on Blind test: 0.42
Accuracy on Blind test: 0.71
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01343632 0.01282287 0.00962019 0.00936198 0.00911808 0.00918531
0.00927401 0.00933409 0.0091629 0.00921702]
mean value: 0.010053277015686035
key: score_time
value: [0.01177359 0.00929523 0.00898552 0.0086925 0.00870514 0.00878668
0.00875974 0.00862837 0.00862598 0.00870895]
mean value: 0.009096169471740722
key: test_mcc
value: [0.40451992 0.19088543 0.29844172 0.48527095 0.46159086 0.42321607
0.48527095 0.53006813 0.55 0.42083333]
mean value: 0.4250097354571631
key: train_mcc
value: [0.47497405 0.48038446 0.52044165 0.4622929 0.44775571 0.54142123
0.52339686 0.51032449 0.45970151 0.49577468]
mean value: 0.4916467544879524
key: test_accuracy
value: [0.6875 0.59375 0.64516129 0.74193548 0.70967742 0.70967742
0.74193548 0.74193548 0.77419355 0.70967742]
mean value: 0.7055443548387097
key: train_accuracy
value: [0.72857143 0.725 0.7544484 0.72597865 0.71886121 0.76512456
0.76156584 0.74733096 0.72241993 0.74377224]
mean value: 0.7393073207930859
key: test_fscore
value: [0.73684211 0.62857143 0.7027027 0.76470588 0.76923077 0.74285714
0.71428571 0.77777778 0.77419355 0.70967742]
mean value: 0.732084449078357
key: train_fscore
value: [0.76100629 0.76595745 0.77669903 0.75080906 0.74433657 0.78571429
0.76655052 0.77602524 0.75471698 0.76623377]
mean value: 0.7648049188632132
key: test_precision
value: [0.63636364 0.57894737 0.61904762 0.72222222 0.65217391 0.68421053
0.76923077 0.66666667 0.75 0.6875 ]
mean value: 0.6766362721311234
key: train_precision
value: [0.67977528 0.66666667 0.71005917 0.68639053 0.68047337 0.7202381
0.75342466 0.69886364 0.6779661 0.70658683]
mean value: 0.6980444341666818
key: test_recall
value: [0.875 0.6875 0.8125 0.8125 0.9375 0.8125
0.66666667 0.93333333 0.8 0.73333333]
mean value: 0.8070833333333334
key: train_recall
value: [0.86428571 0.9 0.85714286 0.82857143 0.82142857 0.86428571
0.78014184 0.87234043 0.85106383 0.83687943]
mean value: 0.8476139817629179
key: test_roc_auc
value: [0.6875 0.59375 0.63958333 0.73958333 0.70208333 0.70625
0.73958333 0.74791667 0.775 0.71041667]
mean value: 0.7041666666666666
key: train_roc_auc
value: [0.72857143 0.725 0.75481256 0.72634245 0.71922492 0.76547619
0.76149949 0.7468845 0.72196049 0.74343972]
mean value: 0.7393211752786222
key: test_jcc
value: [0.58333333 0.45833333 0.54166667 0.61904762 0.625 0.59090909
0.55555556 0.63636364 0.63157895 0.55 ]
mean value: 0.5791788182577656
key: train_jcc
value: [0.6142132 0.62068966 0.63492063 0.60103627 0.59278351 0.64705882
0.62146893 0.63402062 0.60606061 0.62105263]
mean value: 0.6193304868926621
MCC on Blind test: 0.28
Accuracy on Blind test: 0.65
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00932527 0.00929976 0.00981283 0.00938082 0.01028895 0.00962377
0.0106163 0.00959873 0.00959492 0.0094955 ]
mean value: 0.009703683853149413
key: score_time
value: [0.00870562 0.00866961 0.00908709 0.00865245 0.00882196 0.00876093
0.00897956 0.00915599 0.00876236 0.00909448]
mean value: 0.008869004249572755
key: test_mcc
value: [0.12909944 0.50395263 0.6125 0.4184137 0.51837044 0.55
0.4184137 0.44824996 0.35983579 0.6125 ]
mean value: 0.4571335673572818
key: train_mcc
value: [0.56603562 0.52531582 0.53306083 0.57396568 0.5248687 0.56166311
0.53352641 0.52595275 0.56002251 0.54704118]
mean value: 0.5451452600049477
key: test_accuracy
value: [0.5625 0.75 0.80645161 0.70967742 0.74193548 0.77419355
0.70967742 0.70967742 0.67741935 0.80645161]
mean value: 0.7247983870967742
key: train_accuracy
value: [0.78214286 0.76071429 0.76512456 0.78647687 0.76156584 0.77935943
0.76512456 0.76156584 0.77935943 0.77224199]
mean value: 0.7713675648195221
key: test_fscore
value: [0.61111111 0.73333333 0.8125 0.72727273 0.78947368 0.77419355
0.68965517 0.74285714 0.6875 0.8 ]
mean value: 0.7367896719585731
key: train_fscore
value: [0.79037801 0.77441077 0.7755102 0.79166667 0.76975945 0.78911565
0.77852349 0.77441077 0.78767123 0.78378378]
mean value: 0.7815230029466407
key: test_precision
value: [0.55 0.78571429 0.8125 0.70588235 0.68181818 0.8
0.71428571 0.65 0.64705882 0.8 ]
mean value: 0.714725935828877
key: train_precision
value: [0.7615894 0.73248408 0.74025974 0.77027027 0.74172185 0.75324675
0.7388535 0.73717949 0.7615894 0.7483871 ]
mean value: 0.7485581589599934
key: test_recall
value: [0.6875 0.6875 0.8125 0.75 0.9375 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7691666666666667
key: train_recall
value: [0.82142857 0.82142857 0.81428571 0.81428571 0.8 0.82857143
0.82269504 0.81560284 0.81560284 0.82269504]
mean value: 0.8176595744680851
key: test_roc_auc
value: [0.5625 0.75 0.80625 0.70833333 0.73541667 0.775
0.70833333 0.71458333 0.67916667 0.80625 ]
mean value: 0.7245833333333334
key: train_roc_auc
value: [0.78214286 0.76071429 0.76529889 0.78657548 0.76170213 0.77953394
0.76491895 0.76137285 0.77922999 0.7720618 ]
mean value: 0.771355116514691
key: test_jcc
value: [0.44 0.57894737 0.68421053 0.57142857 0.65217391 0.63157895
0.52631579 0.59090909 0.52380952 0.66666667]
mean value: 0.5866040397436278
key: train_jcc
value: [0.65340909 0.63186813 0.63333333 0.65517241 0.62569832 0.65168539
0.63736264 0.63186813 0.64971751 0.64444444]
mean value: 0.641455941498394
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00917029 0.00897074 0.00953841 0.01000524 0.00936627 0.01002192
0.01007271 0.01017094 0.00896072 0.00896215]
mean value: 0.009523940086364747
key: score_time
value: [0.011343 0.01217747 0.01493287 0.01539063 0.01501703 0.01552916
0.01544309 0.01240277 0.01429534 0.01415873]
mean value: 0.014069008827209472
key: test_mcc
value: [0.19738551 0.19738551 0.1784296 0.29166667 0.09283444 0.35416667
0.22630095 0.43041423 0.4184137 0.36121114]
mean value: 0.27482084138853546
key: train_mcc
value: [0.5503511 0.55068879 0.5873797 0.55159836 0.56592685 0.58718338
0.59672888 0.5873797 0.55168747 0.57313743]
mean value: 0.5702061665452179
key: test_accuracy
value: [0.59375 0.59375 0.58064516 0.64516129 0.5483871 0.67741935
0.61290323 0.70967742 0.70967742 0.67741935]
mean value: 0.6348790322580645
key: train_accuracy
value: [0.775 0.775 0.79359431 0.77580071 0.78291815 0.79359431
0.79715302 0.79359431 0.77580071 0.78647687]
mean value: 0.7848932384341637
key: test_fscore
value: [0.64864865 0.51851852 0.51851852 0.64516129 0.58823529 0.6875
0.53846154 0.72727273 0.68965517 0.61538462]
mean value: 0.6177356323658587
key: train_fscore
value: [0.77894737 0.7804878 0.78985507 0.77419355 0.77978339 0.79285714
0.80677966 0.7972028 0.77894737 0.79020979]
mean value: 0.7869263947359504
key: test_precision
value: [0.57142857 0.63636364 0.63636364 0.66666667 0.55555556 0.6875
0.63636364 0.66666667 0.71428571 0.72727273]
mean value: 0.6498466810966811
key: train_precision
value: [0.76551724 0.76190476 0.80147059 0.77697842 0.78832117 0.79285714
0.77272727 0.7862069 0.77083333 0.77931034]
mean value: 0.7796127166965825
key: test_recall
value: [0.75 0.4375 0.4375 0.625 0.625 0.6875
0.46666667 0.8 0.66666667 0.53333333]
mean value: 0.6029166666666667
key: train_recall
value: [0.79285714 0.8 0.77857143 0.77142857 0.77142857 0.79285714
0.84397163 0.80851064 0.78723404 0.80141844]
mean value: 0.7948277608915907
key: test_roc_auc
value: [0.59375 0.59375 0.58541667 0.64583333 0.54583333 0.67708333
0.60833333 0.7125 0.70833333 0.67291667]
mean value: 0.634375
key: train_roc_auc
value: [0.775 0.775 0.79354103 0.77578521 0.78287741 0.79359169
0.79698582 0.79354103 0.77575988 0.78642351]
mean value: 0.7848505572441743
key: test_jcc
value: [0.48 0.35 0.35 0.47619048 0.41666667 0.52380952
0.36842105 0.57142857 0.52631579 0.44444444]
mean value: 0.45072765246449453
key: train_jcc
value: [0.63793103 0.64 0.65269461 0.63157895 0.63905325 0.65680473
0.67613636 0.6627907 0.63793103 0.65317919]
mean value: 0.6488099867340289
MCC on Blind test: 0.13
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01388001 0.01374197 0.01386142 0.01398516 0.01397181 0.01414371
0.01401997 0.01408243 0.01399183 0.01399088]
mean value: 0.013966917991638184
key: score_time
value: [0.01013136 0.0099895 0.00989628 0.01076245 0.009938 0.01009417
0.00990272 0.00999641 0.01002407 0.01006293]
mean value: 0.010079789161682128
key: test_mcc
value: [0.26967994 0.438357 0.54812195 0.6778302 0.36121114 0.4184137
0.68826048 0.69203857 0.48954403 0.48333333]
mean value: 0.5066790355877698
key: train_mcc
value: [0.68599434 0.75001913 0.73708513 0.73015914 0.68713898 0.7388473
0.71590892 0.71561775 0.72244174 0.74385734]
mean value: 0.7227069780548997
key: test_accuracy
value: [0.625 0.71875 0.77419355 0.83870968 0.67741935 0.70967742
0.83870968 0.83870968 0.74193548 0.74193548]
mean value: 0.7505040322580645
key: train_accuracy
value: [0.84285714 0.875 0.8683274 0.86476868 0.84341637 0.8683274
0.85765125 0.85765125 0.86120996 0.87188612]
mean value: 0.8611095577020844
key: test_fscore
value: [0.68421053 0.70967742 0.78787879 0.84848485 0.72222222 0.72727273
0.81481481 0.84848485 0.75 0.73333333]
mean value: 0.762637952816221
key: train_fscore
value: [0.84507042 0.87455197 0.86545455 0.86131387 0.84507042 0.86245353
0.86111111 0.85611511 0.86120996 0.87142857]
mean value: 0.8603779516928948
key: test_precision
value: [0.59090909 0.73333333 0.76470588 0.82352941 0.65 0.70588235
0.91666667 0.77777778 0.70588235 0.73333333]
mean value: 0.7402020202020202
key: train_precision
value: [0.83333333 0.87769784 0.88148148 0.88059701 0.83333333 0.89922481
0.84353741 0.86861314 0.86428571 0.87769784]
mean value: 0.8659801920666141
key: test_recall
value: [0.8125 0.6875 0.8125 0.875 0.8125 0.75
0.73333333 0.93333333 0.8 0.73333333]
mean value: 0.795
key: train_recall
value: [0.85714286 0.87142857 0.85 0.84285714 0.85714286 0.82857143
0.87943262 0.84397163 0.85815603 0.86524823]
mean value: 0.8553951367781155
key: test_roc_auc
value: [0.625 0.71875 0.77291667 0.8375 0.67291667 0.70833333
0.83541667 0.84166667 0.74375 0.74166667]
mean value: 0.7497916666666666
key: train_roc_auc
value: [0.84285714 0.875 0.86826241 0.86469098 0.84346505 0.86818642
0.85757345 0.8577001 0.86122087 0.87190983]
mean value: 0.8610866261398177
key: test_jcc
value: [0.52 0.55 0.65 0.73684211 0.56521739 0.57142857
0.6875 0.73684211 0.6 0.57894737]
mean value: 0.6196777541680287
key: train_jcc
value: [0.73170732 0.77707006 0.76282051 0.75641026 0.73170732 0.75816993
0.75609756 0.74842767 0.75625 0.7721519 ]
mean value: 0.7550812534377662
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.23191261 1.13450933 1.40175247 1.27486777 1.25217056 1.13011384
1.22160625 1.20618701 1.18603587 1.25448847]
mean value: 1.2293644189834594
key: score_time
value: [0.01473403 0.01438618 0.01480508 0.01451826 0.01454473 0.01558375
0.0150125 0.0203166 0.01496863 0.01508808]
mean value: 0.015395784378051757
key: test_mcc
value: [0.25197632 0.56360186 0.54812195 0.6125 0.67916667 0.48527095
0.29166667 0.69203857 0.35416667 0.61925228]
mean value: 0.5097761926765115
key: train_mcc
value: [0.98571429 0.97152771 0.99290744 0.99290744 0.98576494 0.99290744
0.99290744 0.98576494 0.98576494 0.99290744]
mean value: 0.987907404747804
key: test_accuracy
value: [0.625 0.78125 0.77419355 0.80645161 0.83870968 0.74193548
0.64516129 0.83870968 0.67741935 0.80645161]
mean value: 0.7535282258064516
key: train_accuracy
value: [0.99285714 0.98571429 0.99644128 0.99644128 0.99288256 0.99644128
0.99644128 0.99288256 0.99288256 0.99644128]
mean value: 0.993942552109812
key: test_fscore
value: [0.64705882 0.77419355 0.78787879 0.8125 0.83870968 0.76470588
0.64516129 0.84848485 0.66666667 0.8125 ]
mean value: 0.7597859525041688
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[0.99285714 0.98561151 0.99641577 0.99641577 0.99285714 0.99641577
0.99646643 0.9929078 0.9929078 0.99646643]
mean value: 0.9939321573361302
key: test_precision
value: [0.61111111 0.8 0.76470588 0.8125 0.86666667 0.72222222
0.625 0.77777778 0.66666667 0.76470588]
mean value: 0.7411356209150327
key: train_precision
value: [0.99285714 0.99275362 1. 1. 0.99285714 1.
0.99295775 0.9929078 0.9929078 0.99295775]
mean value: 0.9950199004697318
key: test_recall
value: [0.6875 0.75 0.8125 0.8125 0.8125 0.8125
0.66666667 0.93333333 0.66666667 0.86666667]
mean value: 0.7820833333333334
key: train_recall
value: [0.99285714 0.97857143 0.99285714 0.99285714 0.99285714 0.99285714
1. 0.9929078 0.9929078 1. ]
mean value: 0.9928672745694023
key: test_roc_auc
value: [0.625 0.78125 0.77291667 0.80625 0.83958333 0.73958333
0.64583333 0.84166667 0.67708333 0.80833333]
mean value: 0.75375
key: train_roc_auc
value: [0.99285714 0.98571429 0.99642857 0.99642857 0.99288247 0.99642857
0.99642857 0.99288247 0.99288247 0.99642857]
mean value: 0.993936170212766
key: test_jcc
value: [0.47826087 0.63157895 0.65 0.68421053 0.72222222 0.61904762
0.47619048 0.73684211 0.5 0.68421053]
mean value: 0.6182563292288693
key: train_jcc
value: [0.9858156 0.97163121 0.99285714 0.99285714 0.9858156 0.99285714
0.99295775 0.98591549 0.98591549 0.99295775]
mean value: 0.9879580318792186
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02856255 0.02037883 0.02034307 0.01903844 0.02007008 0.01872396
0.01928687 0.01898861 0.02181172 0.02207255]
mean value: 0.02092766761779785
key: score_time
value: [0.01176023 0.00898933 0.00875068 0.00902557 0.00892735 0.00873947
0.00879765 0.00876856 0.00897741 0.00882697]
mean value: 0.009156322479248047
key: test_mcc
value: [0.44539933 0.81409158 0.44824996 0.6778302 0.48333333 0.71269665
0.48527095 0.61925228 0.48954403 0.42083333]
mean value: 0.5596501652767185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71875 0.90625 0.70967742 0.83870968 0.74193548 0.83870968
0.74193548 0.80645161 0.74193548 0.70967742]
mean value: 0.7754032258064516
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.74285714 0.90322581 0.66666667 0.84848485 0.75 0.86486486
0.71428571 0.8125 0.75 0.70967742]
mean value: 0.7762562462965689
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.68421053 0.93333333 0.81818182 0.82352941 0.75 0.76190476
0.76923077 0.76470588 0.70588235 0.6875 ]
mean value: 0.7698478856025296
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.875 0.5625 0.875 0.75 1.
0.66666667 0.86666667 0.8 0.73333333]
mean value: 0.7941666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71875 0.90625 0.71458333 0.8375 0.74166667 0.83333333
0.73958333 0.80833333 0.74375 0.71041667]
mean value: 0.7754166666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.59090909 0.82352941 0.5 0.73684211 0.6 0.76190476
0.55555556 0.68421053 0.6 0.55 ]
mean value: 0.6402951451713061
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10739708 0.1049428 0.11385632 0.10848141 0.10530806 0.10459304
0.10556197 0.10774064 0.10547471 0.10502815]
mean value: 0.10683841705322265
key: score_time
value: [0.01761246 0.01935506 0.01902342 0.01758337 0.01857352 0.0177691
0.01770043 0.01812339 0.01769495 0.01809072]
mean value: 0.018152642250061034
key: test_mcc
value: [0.44539933 0.50395263 0.6125 0.55 0.67916667 0.29069387
0.54812195 0.61925228 0.35983579 0.35983579]
mean value: 0.4968758304596963
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71875 0.75 0.80645161 0.77419355 0.83870968 0.64516129
0.77419355 0.80645161 0.67741935 0.67741935]
mean value: 0.746875
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.74285714 0.73333333 0.8125 0.77419355 0.83870968 0.68571429
0.75862069 0.8125 0.6875 0.6875 ]
mean value: 0.7533428677366386
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.68421053 0.78571429 0.8125 0.8 0.86666667 0.63157895
0.78571429 0.76470588 0.64705882 0.64705882]
mean value: 0.7425208241191213
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.6875 0.8125 0.75 0.8125 0.75
0.73333333 0.86666667 0.73333333 0.73333333]
mean value: 0.7691666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71875 0.75 0.80625 0.775 0.83958333 0.64166667
0.77291667 0.80833333 0.67916667 0.67916667]
mean value: 0.7470833333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.59090909 0.57894737 0.68421053 0.63157895 0.72222222 0.52173913
0.61111111 0.68421053 0.52380952 0.52380952]
mean value: 0.6072547970717307
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.66
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00956964 0.00989819 0.00956774 0.009583 0.00953579 0.010221
0.0096035 0.00991416 0.00965524 0.00966287]
mean value: 0.009721112251281739
key: score_time
value: [0.00921845 0.00870037 0.00887227 0.00867987 0.00864482 0.00938559
0.00907087 0.0090909 0.0087533 0.00865674]
mean value: 0.008907318115234375
key: test_mcc
value: [0.37796447 0.32897585 0.28870546 0.43041423 0.54812195 0.55
0.36121114 0.55573827 0.68826048 0.35416667]
mean value: 0.44835585149333823
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.6875 0.65625 0.64516129 0.70967742 0.77419355 0.77419355
0.67741935 0.77419355 0.83870968 0.67741935]
mean value: 0.7214717741935484
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.59259259 0.66666667 0.68965517 0.78787879 0.77419355
0.61538462 0.74074074 0.81481481 0.66666667]
mean value: 0.7015260272212441
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.72727273 0.64705882 0.76923077 0.76470588 0.8
0.72727273 0.83333333 0.91666667 0.66666667]
mean value: 0.7566493310610958
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.625 0.5 0.6875 0.625 0.8125 0.75
0.53333333 0.66666667 0.73333333 0.66666667]
mean value: 0.66
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.6875 0.65625 0.64375 0.7125 0.77291667 0.775
0.67291667 0.77083333 0.83541667 0.67708333]
mean value: 0.7204166666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.42105263 0.5 0.52631579 0.65 0.63157895
0.44444444 0.58823529 0.6875 0.5 ]
mean value: 0.5449127106983144
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.55
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.53831029 1.45324683 1.46263885 1.45239115 1.44685459 1.45993972
1.44669032 1.45897341 1.45301151 1.46730351]
mean value: 1.4639360189437867
key: score_time
value: [0.09898853 0.09832716 0.09827876 0.09174919 0.09818649 0.09863758
0.09640765 0.09183216 0.0928421 0.0926671 ]
mean value: 0.09579167366027833
key: test_mcc
value: [0.625 0.51639778 0.6778302 0.74896053 0.67916667 0.48527095
0.68826048 0.69203857 0.42083333 0.49612132]
mean value: 0.6029879822988455
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.75 0.83870968 0.87096774 0.83870968 0.74193548
0.83870968 0.83870968 0.70967742 0.74193548]
mean value: 0.7981854838709678
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8125 0.71428571 0.84848485 0.86666667 0.83870968 0.76470588
0.81481481 0.84848485 0.70967742 0.69230769]
mean value: 0.791063756417172
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8125 0.83333333 0.82352941 0.92857143 0.86666667 0.72222222
0.91666667 0.77777778 0.6875 0.81818182]
mean value: 0.8186949325184619
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.625 0.875 0.8125 0.8125 0.8125
0.73333333 0.93333333 0.73333333 0.6 ]
mean value: 0.775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.75 0.8375 0.87291667 0.83958333 0.73958333
0.83541667 0.84166667 0.71041667 0.7375 ]
mean value: 0.7977083333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.68421053 0.55555556 0.73684211 0.76470588 0.72222222 0.61904762
0.6875 0.73684211 0.55 0.52941176]
mean value: 0.6586337780726326
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.72
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.89998579 1.01772881 0.89380383 0.91389751 0.90953159 0.89730668
0.92596483 0.93635964 0.91237187 1.07121158]
mean value: 0.9378162145614624
key: score_time
value: [0.24249458 0.22559667 0.13978267 0.23898888 0.2287128 0.2048564
0.23368526 0.22588706 0.22489595 0.18387413]
mean value: 0.21487743854522706
key: test_mcc
value: [0.438357 0.59215653 0.61608311 0.74166667 0.67916667 0.54812195
0.71269665 0.69203857 0.42083333 0.57461167]
mean value: 0.6015732146319642
key: train_mcc
value: [0.87859384 0.9 0.87902736 0.88611955 0.90768608 0.91467803
0.91467803 0.89325701 0.90749747 0.90035461]
mean value: 0.8981891969122482
key: test_accuracy
value: [0.71875 0.78125 0.80645161 0.87096774 0.83870968 0.77419355
0.83870968 0.83870968 0.70967742 0.77419355]
mean value: 0.7951612903225806
key: train_accuracy
value: [0.93928571 0.95 0.93950178 0.9430605 0.95373665 0.95729537
0.95729537 0.94661922 0.95373665 0.95017794]
mean value: 0.9490709201830199
key: test_fscore
value: [0.72727273 0.74074074 0.82352941 0.875 0.83870968 0.78787879
0.8 0.84848485 0.70967742 0.72 ]
mean value: 0.7871293612916004
key: train_fscore
value: [0.93950178 0.95 0.93950178 0.94285714 0.9540636 0.95683453
0.95774648 0.94699647 0.95373665 0.95035461]
mean value: 0.9491593048228071
key: test_precision
value: [0.70588235 0.90909091 0.77777778 0.875 0.86666667 0.76470588
1. 0.77777778 0.6875 0.9 ]
mean value: 0.8264401366607249
key: train_precision
value: [0.93617021 0.95 0.93617021 0.94285714 0.94405594 0.96376812
0.95104895 0.94366197 0.95714286 0.95035461]
mean value: 0.9475230018338903
key: test_recall
value: [0.75 0.625 0.875 0.875 0.8125 0.8125
0.66666667 0.93333333 0.73333333 0.6 ]
mean value: 0.7683333333333333
key: train_recall
value: [0.94285714 0.95 0.94285714 0.94285714 0.96428571 0.95
0.96453901 0.95035461 0.95035461 0.95035461]
mean value: 0.9508459979736575
key: test_roc_auc
value: [0.71875 0.78125 0.80416667 0.87083333 0.83958333 0.77291667
0.83333333 0.84166667 0.71041667 0.76875 ]
mean value: 0.7941666666666667
key: train_roc_auc
value: [0.93928571 0.95 0.93951368 0.94305978 0.95377406 0.9572695
0.9572695 0.94660588 0.95374873 0.9501773 ]
mean value: 0.9490704154002026
key: test_jcc
value: [0.57142857 0.58823529 0.7 0.77777778 0.72222222 0.65
0.66666667 0.73684211 0.55 0.5625 ]
mean value: 0.6525672637476043
key: train_jcc
value: [0.88590604 0.9047619 0.88590604 0.89189189 0.91216216 0.91724138
0.91891892 0.89932886 0.91156463 0.90540541]
mean value: 0.9033087227898283
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02231336 0.00941491 0.00936151 0.01038527 0.00926423 0.00929189
0.0093472 0.00929093 0.00926256 0.00975227]
mean value: 0.010768413543701172
key: score_time
value: [0.0153296 0.00867748 0.00866079 0.00852895 0.00856781 0.00855899
0.00863695 0.00885201 0.00861931 0.00857711]
mean value: 0.009300899505615235
key: test_mcc
value: [0.12909944 0.50395263 0.6125 0.4184137 0.51837044 0.55
0.4184137 0.44824996 0.35983579 0.6125 ]
mean value: 0.4571335673572818
key: train_mcc
value: [0.56603562 0.52531582 0.53306083 0.57396568 0.5248687 0.56166311
0.53352641 0.52595275 0.56002251 0.54704118]
mean value: 0.5451452600049477
key: test_accuracy
value: [0.5625 0.75 0.80645161 0.70967742 0.74193548 0.77419355
0.70967742 0.70967742 0.67741935 0.80645161]
mean value: 0.7247983870967742
key: train_accuracy
value: [0.78214286 0.76071429 0.76512456 0.78647687 0.76156584 0.77935943
0.76512456 0.76156584 0.77935943 0.77224199]
mean value: 0.7713675648195221
key: test_fscore
value: [0.61111111 0.73333333 0.8125 0.72727273 0.78947368 0.77419355
0.68965517 0.74285714 0.6875 0.8 ]
mean value: 0.7367896719585731
key: train_fscore
value: [0.79037801 0.77441077 0.7755102 0.79166667 0.76975945 0.78911565
0.77852349 0.77441077 0.78767123 0.78378378]
mean value: 0.7815230029466407
key: test_precision
value: [0.55 0.78571429 0.8125 0.70588235 0.68181818 0.8
0.71428571 0.65 0.64705882 0.8 ]
mean value: 0.714725935828877
key: train_precision
value: [0.7615894 0.73248408 0.74025974 0.77027027 0.74172185 0.75324675
0.7388535 0.73717949 0.7615894 0.7483871 ]
mean value: 0.7485581589599934
key: test_recall
value: [0.6875 0.6875 0.8125 0.75 0.9375 0.75
0.66666667 0.86666667 0.73333333 0.8 ]
mean value: 0.7691666666666667
key: train_recall
value: [0.82142857 0.82142857 0.81428571 0.81428571 0.8 0.82857143
0.82269504 0.81560284 0.81560284 0.82269504]
mean value: 0.8176595744680851
key: test_roc_auc
value: [0.5625 0.75 0.80625 0.70833333 0.73541667 0.775
0.70833333 0.71458333 0.67916667 0.80625 ]
mean value: 0.7245833333333334
key: train_roc_auc
value: [0.78214286 0.76071429 0.76529889 0.78657548 0.76170213 0.77953394
0.76491895 0.76137285 0.77922999 0.7720618 ]
mean value: 0.771355116514691
key: test_jcc
value: [0.44 0.57894737 0.68421053 0.57142857 0.65217391 0.63157895
0.52631579 0.59090909 0.52380952 0.66666667]
mean value: 0.5866040397436278
key: train_jcc
value: [0.65340909 0.63186813 0.63333333 0.65517241 0.62569832 0.65168539
0.63736264 0.63186813 0.64971751 0.64444444]
mean value: 0.641455941498394
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09660125 0.07135081 0.09051609 0.06995201 0.07267976 0.07351184
0.08459163 0.07223678 0.07856154 0.14531898]
mean value: 0.08553206920623779
key: score_time
value: [0.01076746 0.01077509 0.01137328 0.01072907 0.01073861 0.01068258
0.01104283 0.01110196 0.01162791 0.01273251]
mean value: 0.01115713119506836
key: test_mcc
value: [0.438357 0.72374686 0.74166667 0.74689528 0.6125 0.80753845
0.74166667 0.48333333 0.42083333 0.74689528]
mean value: 0.6463432884113032
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71875 0.84375 0.87096774 0.87096774 0.80645161 0.90322581
0.87096774 0.74193548 0.70967742 0.87096774]
mean value: 0.8207661290322581
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.70967742 0.81481481 0.875 0.88235294 0.8125 0.90909091
0.86666667 0.73333333 0.70967742 0.85714286]
mean value: 0.8170256360934729
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.73333333 1. 0.875 0.83333333 0.8125 0.88235294
0.86666667 0.73333333 0.6875 0.92307692]
mean value: 0.8347096530920061
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.6875 0.875 0.9375 0.8125 0.9375
0.86666667 0.73333333 0.73333333 0.8 ]
mean value: 0.8070833333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71875 0.84375 0.87083333 0.86875 0.80625 0.90208333
0.87083333 0.74166667 0.71041667 0.86875 ]
mean value: 0.8202083333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55 0.6875 0.77777778 0.78947368 0.68421053 0.83333333
0.76470588 0.57894737 0.55 0.75 ]
mean value: 0.6965948572411421
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03259182 0.06050849 0.03706932 0.06424642 0.06668234 0.056499
0.03153658 0.03218102 0.05929947 0.05167437]
mean value: 0.049228882789611815
key: score_time
value: [0.02350378 0.01192856 0.023247 0.02247643 0.02323031 0.01227117
0.01193428 0.01194763 0.02183199 0.01188064]
mean value: 0.017425179481506348
key: test_mcc
value: [0.46056619 0.19738551 0.35445878 0.5612264 0.61608311 0.49612132
0.4184137 0.61925228 0.48527095 0.35416667]
mean value: 0.4562944907398907
key: train_mcc
value: [0.82144953 0.89306221 0.85054967 0.85764944 0.85798288 0.86478545
0.8718845 0.83736545 0.87197933 0.8718845 ]
mean value: 0.8598592960239736
key: test_accuracy
value: [0.71875 0.59375 0.67741935 0.77419355 0.80645161 0.74193548
0.70967742 0.80645161 0.74193548 0.67741935]
mean value: 0.7247983870967741
key: train_accuracy
value: [0.91071429 0.94642857 0.9252669 0.92882562 0.92882562 0.93238434
0.93594306 0.91814947 0.93594306 0.93594306]
mean value: 0.9298423995932893
key: test_fscore
value: [0.75675676 0.51851852 0.70588235 0.75862069 0.82352941 0.77777778
0.68965517 0.8125 0.71428571 0.66666667]
mean value: 0.7224193060780282
key: train_fscore
value: [0.91103203 0.94584838 0.92473118 0.92857143 0.92753623 0.93189964
0.93617021 0.91636364 0.93571429 0.93617021]
mean value: 0.9294037236359098
key: test_precision
value: [0.66666667 0.63636364 0.66666667 0.84615385 0.77777778 0.7
0.71428571 0.76470588 0.76923077 0.66666667]
mean value: 0.7208517626164684
key: train_precision
value: [0.90780142 0.95620438 0.92805755 0.92857143 0.94117647 0.9352518
0.93617021 0.94029851 0.94244604 0.93617021]
mean value: 0.9352148025839478
key: test_recall
value: [0.875 0.4375 0.75 0.6875 0.875 0.875
0.66666667 0.86666667 0.66666667 0.66666667]
mean value: 0.7366666666666667
key: train_recall
value: [0.91428571 0.93571429 0.92142857 0.92857143 0.91428571 0.92857143
0.93617021 0.89361702 0.92907801 0.93617021]
mean value: 0.9237892603850051
key: test_roc_auc
value: [0.71875 0.59375 0.675 0.77708333 0.80416667 0.7375
0.70833333 0.80833333 0.73958333 0.67708333]
mean value: 0.7239583333333334
key: train_roc_auc
value: [0.91071429 0.94642857 0.92525329 0.92882472 0.92877406 0.93237082
0.93594225 0.91823708 0.93596758 0.93594225]
mean value: 0.9298454913880446
key: test_jcc
value: [0.60869565 0.35 0.54545455 0.61111111 0.7 0.63636364
0.52631579 0.68421053 0.55555556 0.5 ]
mean value: 0.5717706816448235
key: train_jcc
value: [0.83660131 0.89726027 0.86 0.86666667 0.86486486 0.87248322
0.88 0.84563758 0.87919463 0.88 ]
mean value: 0.8682708548935287
MCC on Blind test: 0.36
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01248503 0.0127666 0.00925589 0.00910306 0.00888062 0.00891542
0.00893378 0.00901175 0.00891209 0.00891852]
mean value: 0.00971827507019043
key: score_time
value: [0.01142454 0.0092175 0.00878191 0.00855398 0.00835586 0.00841069
0.00845838 0.00846076 0.0083797 0.00847077]
mean value: 0.008851408958435059
key: test_mcc
value: [0.38729833 0.12598816 0.35445878 0.35416667 0.57461167 0.28870546
0.6310315 0.63696156 0.61925228 0.4184137 ]
mean value: 0.43908881135182387
key: train_mcc
value: [0.46524806 0.47316995 0.46906706 0.46229203 0.44610424 0.50268922
0.46801866 0.46667052 0.44582343 0.48864808]
mean value: 0.46877312600272963
key: test_accuracy
value: [0.6875 0.5625 0.67741935 0.67741935 0.77419355 0.64516129
0.80645161 0.80645161 0.80645161 0.70967742]
mean value: 0.7153225806451613
key: train_accuracy
value: [0.73214286 0.73571429 0.73309609 0.72953737 0.72241993 0.75088968
0.73309609 0.73309609 0.72241993 0.74377224]
mean value: 0.7336184544992375
key: test_fscore
value: [0.72222222 0.53333333 0.70588235 0.6875 0.81081081 0.66666667
0.76923077 0.82352941 0.8125 0.68965517]
mean value: 0.7221330739383478
key: train_fscore
value: [0.74048443 0.74657534 0.74576271 0.74324324 0.73103448 0.75694444
0.74576271 0.74048443 0.73287671 0.75342466]
mean value: 0.7436593164635377
key: test_precision
value: [0.65 0.57142857 0.66666667 0.6875 0.71428571 0.64705882
0.90909091 0.73684211 0.76470588 0.71428571]
mean value: 0.7061864386903086
key: train_precision
value: [0.71812081 0.71710526 0.70967742 0.70512821 0.70666667 0.73648649
0.71428571 0.72297297 0.70860927 0.72847682]
mean value: 0.7167529626137138
key: test_recall
value: [0.8125 0.5 0.75 0.6875 0.9375 0.6875
0.66666667 0.93333333 0.86666667 0.66666667]
mean value: 0.7508333333333334
key: train_recall
value: [0.76428571 0.77857143 0.78571429 0.78571429 0.75714286 0.77857143
0.78014184 0.75886525 0.75886525 0.78014184]
mean value: 0.7728014184397163
key: test_roc_auc
value: [0.6875 0.5625 0.675 0.67708333 0.76875 0.64375
0.80208333 0.81041667 0.80833333 0.70833333]
mean value: 0.714375
key: train_roc_auc
value: [0.73214286 0.73571429 0.73328267 0.72973658 0.72254306 0.75098784
0.73292806 0.73300405 0.72228977 0.74364235]
mean value: 0.7336271529888551
key: test_jcc
value: [0.56521739 0.36363636 0.54545455 0.52380952 0.68181818 0.5
0.625 0.7 0.68421053 0.52631579]
mean value: 0.5715462321812436
key: train_jcc
value: [0.58791209 0.59562842 0.59459459 0.59139785 0.57608696 0.60893855
0.59459459 0.58791209 0.57837838 0.6043956 ]
mean value: 0.5919839116558032
MCC on Blind test: 0.31
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01262021 0.01605392 0.01505256 0.01697302 0.01637769 0.01547146
0.01523614 0.0172348 0.01650333 0.01944351]
mean value: 0.016096663475036622
key: score_time
value: [0.00845838 0.01096606 0.01121974 0.0114274 0.01142192 0.0114193
0.01137662 0.01137543 0.01142097 0.01138806]
mean value: 0.01104738712310791
key: test_mcc
value: [0.31814238 0.37796447 0.6125 0.55777335 0.372678 0.4770843
0.48527095 0.61608311 0.57461167 0.25389818]
mean value: 0.4646006411623224
key: train_mcc
value: [0.73914049 0.78579447 0.7083207 0.6231496 0.39685714 0.60003451
0.66383564 0.71092171 0.75796241 0.70017814]
mean value: 0.6686194811960497
key: test_accuracy
value: [0.65625 0.6875 0.80645161 0.74193548 0.61290323 0.70967742
0.74193548 0.80645161 0.77419355 0.61290323]
mean value: 0.7150201612903225
key: train_accuracy
value: [0.85714286 0.89285714 0.85409253 0.77935943 0.63701068 0.76512456
0.82562278 0.83985765 0.87544484 0.82918149]
mean value: 0.8155693950177936
key: test_fscore
value: [0.62068966 0.66666667 0.8125 0.8 0.4 0.64
0.71428571 0.78571429 0.72 0.66666667]
mean value: 0.6826522988505747
key: train_fscore
value: [0.83606557 0.89361702 0.85198556 0.81871345 0.42696629 0.69158879
0.80784314 0.81327801 0.86692015 0.85454545]
mean value: 0.7861523434278199
key: test_precision
value: [0.69230769 0.71428571 0.8125 0.66666667 1. 0.88888889
0.76923077 0.84615385 0.9 0.57142857]
mean value: 0.7861462148962148
key: train_precision
value: [0.98076923 0.88732394 0.86131387 0.69306931 1. 1.
0.90350877 0.98 0.93442623 0.74603175]
mean value: 0.8986443097444802
key: test_recall
value: [0.5625 0.625 0.8125 1. 0.25 0.5
0.66666667 0.73333333 0.6 0.8 ]
mean value: 0.655
key: train_recall
value: [0.72857143 0.9 0.84285714 1. 0.27142857 0.52857143
0.73049645 0.69503546 0.80851064 1. ]
mean value: 0.7505471124620061
key: test_roc_auc
value: [0.65625 0.6875 0.80625 0.73333333 0.625 0.71666667
0.73958333 0.80416667 0.76875 0.61875 ]
mean value: 0.715625
key: train_roc_auc
value: [0.85714286 0.89285714 0.85405268 0.78014184 0.63571429 0.76428571
0.82596251 0.84037487 0.87568389 0.82857143]
mean value: 0.8154787234042553
key: test_jcc
value: [0.45 0.5 0.68421053 0.66666667 0.25 0.47058824
0.55555556 0.64705882 0.5625 0.5 ]
mean value: 0.5286579807361541
key: train_jcc
value: [0.71830986 0.80769231 0.74213836 0.69306931 0.27142857 0.52857143
0.67763158 0.68531469 0.76510067 0.74603175]
mean value: 0.6635288519992544
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01545095 0.01672411 0.01582861 0.01625156 0.01635623 0.0165143
0.01512074 0.01825953 0.01661515 0.01824665]
mean value: 0.016536784172058106
key: score_time
value: [0.01169014 0.01137948 0.01162958 0.01137543 0.01138616 0.01170897
0.01159811 0.0119915 0.01221395 0.01147962]
mean value: 0.011645293235778809
key: test_mcc
value: [0.37796447 0.32897585 0.37191715 0.66057826 0.58316015 0.49612132
0.48333333 0.6681531 0.42083333 0.42321607]
mean value: 0.4814253040803977
key: train_mcc
value: [0.63245553 0.60648725 0.72898583 0.69608742 0.70724431 0.71414649
0.72646619 0.60723774 0.69071737 0.78942195]
mean value: 0.6899250078740438
key: test_accuracy
value: [0.65625 0.65625 0.67741935 0.80645161 0.77419355 0.74193548
0.74193548 0.80645161 0.70967742 0.70967742]
mean value: 0.7280241935483871
key: train_accuracy
value: [0.78571429 0.775 0.86120996 0.83629893 0.83985765 0.84697509
0.86120996 0.77580071 0.82562278 0.89323843]
mean value: 0.8300927808845958
key: test_fscore
value: [0.52173913 0.7027027 0.64285714 0.84210526 0.74074074 0.77777778
0.73333333 0.83333333 0.70967742 0.66666667]
mean value: 0.7170933510359213
key: train_fscore
value: [0.72727273 0.81415929 0.85057471 0.85443038 0.81327801 0.86261981
0.86868687 0.81524927 0.85106383 0.88888889]
mean value: 0.8346223782529265
key: test_precision
value: [0.85714286 0.61904762 0.75 0.72727273 0.90909091 0.7
0.73333333 0.71428571 0.6875 0.75 ]
mean value: 0.744767316017316
key: train_precision
value: [1. 0.69346734 0.91735537 0.76704545 0.97029703 0.78034682
0.82692308 0.695 0.74468085 0.93023256]
mean value: 0.8325348499768358
key: test_recall
value: [0.375 0.8125 0.5625 1. 0.625 0.875
0.73333333 1. 0.73333333 0.6 ]
mean value: 0.7316666666666667
key: train_recall
value: [0.57142857 0.98571429 0.79285714 0.96428571 0.7 0.96428571
0.91489362 0.9858156 0.9929078 0.85106383]
mean value: 0.8723252279635259
key: test_roc_auc
value: [0.65625 0.65625 0.68125 0.8 0.77916667 0.7375
0.74166667 0.8125 0.71041667 0.70625 ]
mean value: 0.728125
key: train_roc_auc
value: [0.78571429 0.775 0.86096758 0.83675279 0.8393617 0.84739108
0.86101824 0.77505066 0.82502533 0.89338906]
mean value: 0.8299670719351571
key: test_jcc
value: [0.35294118 0.54166667 0.47368421 0.72727273 0.58823529 0.63636364
0.57894737 0.71428571 0.55 0.5 ]
mean value: 0.5663396794124348
key: train_jcc
value: [0.57142857 0.68656716 0.74 0.74585635 0.68531469 0.75842697
0.76785714 0.68811881 0.74074074 0.8 ]
mean value: 0.7184310436284728
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.14749765 0.12956071 0.130409 0.13197517 0.13010478 0.13024807
0.12997556 0.1305058 0.13536215 0.13055086]
mean value: 0.13261897563934327
key: score_time
value: [0.01478624 0.01482654 0.01473641 0.01463675 0.0148015 0.01491022
0.01483297 0.01472735 0.01480532 0.01476169]
mean value: 0.01478250026702881
key: test_mcc
value: [0.56360186 0.48653363 0.74689528 0.6778302 0.63696156 0.6778302
0.48333333 0.35445878 0.29166667 0.48527095]
mean value: 0.5404382465644691
key: train_mcc
value: [1. 0.99288247 1. 0.98586555 0.96443769 0.98586412
0.98586412 0.99290744 1. 1. ]
mean value: 0.9907821400761553
key: test_accuracy
value: [0.78125 0.71875 0.87096774 0.83870968 0.80645161 0.83870968
0.74193548 0.67741935 0.64516129 0.74193548]
mean value: 0.7661290322580645
key: train_accuracy
value: [1. 0.99642857 1. 0.99288256 0.98220641 0.99288256
0.99288256 0.99644128 1. 1. ]
mean value: 0.9953723945094052
key: test_fscore
value: [0.77419355 0.64 0.88235294 0.84848485 0.78571429 0.84848485
0.73333333 0.64285714 0.64516129 0.71428571]
mean value: 0.7514867953046321
key: train_fscore
value: [1. 0.99644128 1. 0.9929078 0.98220641 0.99280576
0.99295775 0.99646643 1. 1. ]
mean value: 0.9953785421221143
key: test_precision
value: [0.8 0.88888889 0.83333333 0.82352941 0.91666667 0.82352941
0.73333333 0.69230769 0.625 0.76923077]
mean value: 0.7905819507290095
key: train_precision
value: [1. 0.9929078 1. 0.98591549 0.9787234 1.
0.98601399 0.99295775 1. 1. ]
mean value: 0.9936518431124365
key: test_recall
value: [0.75 0.5 0.9375 0.875 0.6875 0.875
0.73333333 0.6 0.66666667 0.66666667]
mean value: 0.7291666666666666
key: train_recall
value: [1. 1. 1. 1. 0.98571429 0.98571429
1. 1. 1. 1. ]
mean value: 0.9971428571428571
key: test_roc_auc
value: [0.78125 0.71875 0.86875 0.8375 0.81041667 0.8375
0.74166667 0.675 0.64583333 0.73958333]
mean value: 0.765625
key: train_roc_auc
value: [1. 0.99642857 1. 0.9929078 0.98221884 0.99285714
0.99285714 0.99642857 1. 1. ]
mean value: 0.9953698074974671
key: test_jcc
value: [0.63157895 0.47058824 0.78947368 0.73684211 0.64705882 0.73684211
0.57894737 0.47368421 0.47619048 0.55555556]
mean value: 0.6096761511622193
key: train_jcc
value: [1. 0.9929078 1. 0.98591549 0.96503497 0.98571429
0.98601399 0.99295775 1. 1. ]
mean value: 0.9908544277618296
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05952263 0.08039355 0.07478786 0.0584228 0.06083059 0.06444645
0.05043173 0.07103467 0.06729746 0.07202792]
mean value: 0.06591956615447998
key: score_time
value: [0.01850533 0.03880453 0.02360177 0.02254486 0.02347612 0.02948332
0.01844764 0.03052831 0.02998972 0.02165484]
mean value: 0.02570364475250244
key: test_mcc
value: [0.438357 0.75592895 0.43041423 0.80833333 0.6778302 0.61608311
0.61608311 0.48333333 0.54812195 0.66057826]
mean value: 0.6035063490310029
key: train_mcc
value: [0.95802308 0.97879618 0.9439125 0.9716269 0.95039023 0.93652752
0.97152989 0.95078573 0.97192667 0.95768728]
mean value: 0.9591205983699052
key: test_accuracy
value: [0.71875 0.875 0.70967742 0.90322581 0.83870968 0.80645161
0.80645161 0.74193548 0.77419355 0.80645161]
mean value: 0.7980846774193548
key: train_accuracy
value: [0.97857143 0.98928571 0.97153025 0.98576512 0.97508897 0.96797153
0.98576512 0.97508897 0.98576512 0.97864769]
mean value: 0.9793479918657855
key: test_fscore
value: [0.70967742 0.86666667 0.68965517 0.90322581 0.84848485 0.82352941
0.78571429 0.73333333 0.75862069 0.75 ]
mean value: 0.7868907633839257
key: train_fscore
value: [0.97810219 0.98916968 0.97080292 0.98561151 0.97472924 0.96727273
0.9858156 0.97472924 0.98561151 0.97841727]
mean value: 0.9790261886213206
key: test_precision
value: [0.73333333 0.92857143 0.76923077 0.93333333 0.82352941 0.77777778
0.84615385 0.73333333 0.78571429 1. ]
mean value: 0.8330977519212813
key: train_precision
value: [1. 1. 0.99253731 0.99275362 0.98540146 0.98518519
0.9858156 0.99264706 1. 0.99270073]
mean value: 0.9927040973247857
key: test_recall
value: [0.6875 0.8125 0.625 0.875 0.875 0.875
0.73333333 0.73333333 0.73333333 0.6 ]
mean value: 0.755
key: train_recall
value: [0.95714286 0.97857143 0.95 0.97857143 0.96428571 0.95
0.9858156 0.95744681 0.97163121 0.96453901]
mean value: 0.9658004052684903
key: test_roc_auc
value: [0.71875 0.875 0.7125 0.90416667 0.8375 0.80416667
0.80416667 0.74166667 0.77291667 0.8 ]
mean value: 0.7970833333333334
key: train_roc_auc
value: [0.97857143 0.98928571 0.9714539 0.98573961 0.97505066 0.9679078
0.98576494 0.97515198 0.9858156 0.97869807]
mean value: 0.9793439716312057
key: test_jcc
value: [0.55 0.76470588 0.52631579 0.82352941 0.73684211 0.7
0.64705882 0.57894737 0.61111111 0.6 ]
mean value: 0.6538510491916064
key: train_jcc
value: [0.95714286 0.97857143 0.94326241 0.97163121 0.95070423 0.93661972
0.97202797 0.95070423 0.97163121 0.95774648]
mean value: 0.9590041728324616
MCC on Blind test: 0.56
Accuracy on Blind test: 0.78
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.06940794 0.07174492 0.07724261 0.07404685 0.07390547 0.07745314
0.07874274 0.04722691 0.05125523 0.1250515 ]
mean value: 0.0746077299118042
key: score_time
value: [0.0226953 0.02435803 0.02252817 0.02285171 0.03218961 0.02162409
0.03372359 0.01344872 0.01349521 0.02339959]
mean value: 0.023031401634216308
key: test_mcc
value: [0.19088543 0.44539933 0.55 0.61925228 0.42083333 0.35445878
0.35983579 0.69203857 0.61925228 0.54812195]
mean value: 0.4800077745753996
key: train_mcc
value: [0.98571429 0.98571429 0.98586412 0.98576494 0.98576494 0.98576494
0.98576494 0.98576494 0.98576494 0.98576494]
mean value: 0.9857647305833265
key: test_accuracy
value: [0.59375 0.71875 0.77419355 0.80645161 0.70967742 0.67741935
0.67741935 0.83870968 0.80645161 0.77419355]
mean value: 0.7377016129032258
key: train_accuracy
value: [0.99285714 0.99285714 0.99288256 0.99288256 0.99288256 0.99288256
0.99288256 0.99288256 0.99288256 0.99288256]
mean value: 0.9928774783934926
key: test_fscore
value: [0.62857143 0.68965517 0.77419355 0.8 0.70967742 0.70588235
0.6875 0.84848485 0.8125 0.75862069]
mean value: 0.7415085459808355
key: train_fscore
value: [0.99285714 0.99285714 0.99280576 0.99285714 0.99285714 0.99285714
0.9929078 0.9929078 0.9929078 0.9929078 ]
mean value: 0.9928722675355157
key: test_precision
value: [0.57894737 0.76923077 0.8 0.85714286 0.73333333 0.66666667
0.64705882 0.77777778 0.76470588 0.78571429]
mean value: 0.7380577764169095
key: train_precision
value: [0.99285714 0.99285714 1. 0.99285714 0.99285714 0.99285714
0.9929078 0.9929078 0.9929078 0.9929078 ]
mean value: 0.9935916919959473
key: test_recall
value: [0.6875 0.625 0.75 0.75 0.6875 0.75
0.73333333 0.93333333 0.86666667 0.73333333]
mean value: 0.7516666666666667
key: train_recall
value: [0.99285714 0.99285714 0.98571429 0.99285714 0.99285714 0.99285714
0.9929078 0.9929078 0.9929078 0.9929078 ]
mean value: 0.9921631205673759
key: test_roc_auc
value: [0.59375 0.71875 0.775 0.80833333 0.71041667 0.675
0.67916667 0.84166667 0.80833333 0.77291667]
mean value: 0.7383333333333334
key: train_roc_auc
value: [0.99285714 0.99285714 0.99285714 0.99288247 0.99288247 0.99288247
0.99288247 0.99288247 0.99288247 0.99288247]
mean value: 0.9928748733535968
key: test_jcc
value: [0.45833333 0.52631579 0.63157895 0.66666667 0.55 0.54545455
0.52380952 0.73684211 0.68421053 0.61111111]
mean value: 0.5934322548796233
key: train_jcc
value: [0.9858156 0.9858156 0.98571429 0.9858156 0.9858156 0.9858156
0.98591549 0.98591549 0.98591549 0.98591549]
mean value: 0.9858454271729669
MCC on Blind test: 0.27
Accuracy on Blind test: 0.64
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.50104094 0.4917357 0.48631215 0.49081039 0.50207019 0.4933908
0.49305654 0.48927522 0.47748899 0.47835588]
mean value: 0.4903536796569824
key: score_time
value: [0.0100379 0.0117836 0.00951767 0.01037145 0.01037979 0.00925756
0.00933361 0.0092392 0.00934649 0.00907397]
mean value: 0.00983412265777588
key: test_mcc
value: [0.625 0.82717019 0.55 0.80753845 0.61925228 0.6778302
0.61608311 0.48333333 0.42083333 0.61608311]
mean value: 0.6243124021507616
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.90625 0.77419355 0.90322581 0.80645161 0.83870968
0.80645161 0.74193548 0.70967742 0.80645161]
mean value: 0.8105846774193548
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8125 0.89655172 0.77419355 0.90909091 0.8 0.84848485
0.78571429 0.73333333 0.70967742 0.78571429]
mean value: 0.8055260354217528
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8125 1. 0.8 0.88235294 0.85714286 0.82352941
0.84615385 0.73333333 0.6875 0.84615385]
mean value: 0.828866623572506
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.8125 0.75 0.9375 0.75 0.875
0.73333333 0.73333333 0.73333333 0.73333333]
mean value: 0.7870833333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.90625 0.775 0.90208333 0.80833333 0.8375
0.80416667 0.74166667 0.71041667 0.80416667]
mean value: 0.8102083333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.68421053 0.8125 0.63157895 0.83333333 0.66666667 0.73684211
0.64705882 0.57894737 0.55 0.64705882]
mean value: 0.6788196594427245
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02253461 0.02282524 0.02330494 0.03614402 0.02317977 0.02384901
0.02774358 0.02345324 0.02367496 0.02446938]
mean value: 0.025117874145507812
key: score_time
value: [0.01177716 0.01243234 0.01277757 0.01222444 0.01561308 0.01716805
0.01649523 0.01491165 0.01481533 0.01486301]
mean value: 0.014307785034179687
key: test_mcc
value: [0.44539933 0.37796447 0.48527095 0.48333333 0.48333333 0.50596443
0.55 0.71807033 0.23012754 0.25389818]
mean value: 0.453336189415535
key: train_mcc
value: [1. 0.95060645 1. 0.98586555 0.94460323 1.
0.97192106 0.96501929 0.89833485 0.95816272]
mean value: 0.9674513135125098
key: test_accuracy
value: [0.71875 0.6875 0.74193548 0.74193548 0.74193548 0.70967742
0.77419355 0.83870968 0.61290323 0.61290323]
mean value: 0.7180443548387097
key: train_accuracy
value: [1. 0.975 1. 0.99288256 0.97153025 1.
0.98576512 0.98220641 0.94661922 0.97864769]
mean value: 0.9832651245551601
key: test_fscore
value: [0.74285714 0.70588235 0.76470588 0.75 0.75 0.7804878
0.77419355 0.85714286 0.625 0.66666667]
mean value: 0.741693625522593
key: train_fscore
value: [1. 0.9754386 1. 0.9929078 0.97222222 1.
0.98601399 0.9825784 0.94949495 0.97916667]
mean value: 0.9837822619520036
key: test_precision
value: [0.68421053 0.66666667 0.72222222 0.75 0.75 0.64
0.75 0.75 0.58823529 0.57142857]
mean value: 0.6872763280750896
key: train_precision
value: [1. 0.95862069 1. 0.98591549 0.94594595 1.
0.97241379 0.96575342 0.90384615 0.95918367]
mean value: 0.9691679173635389
key: test_recall
value: [0.8125 0.75 0.8125 0.75 0.75 1.
0.8 1. 0.66666667 0.8 ]
mean value: 0.8141666666666667
key: train_recall
value: [1. 0.99285714 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9992857142857143
key: test_roc_auc
value: [0.71875 0.6875 0.73958333 0.74166667 0.74166667 0.7
0.775 0.84375 0.61458333 0.61875 ]
mean value: 0.718125
key: train_roc_auc
value: [1. 0.975 1. 0.9929078 0.97163121 1.
0.98571429 0.98214286 0.94642857 0.97857143]
mean value: 0.9832396149949342
key: test_jcc
value: [0.59090909 0.54545455 0.61904762 0.6 0.6 0.64
0.63157895 0.75 0.45454545 0.5 ]
mean value: 0.5931535657325131
key: train_jcc
value: [1. 0.95205479 1. 0.98591549 0.94594595 1.
0.97241379 0.96575342 0.90384615 0.95918367]
mean value: 0.9685113278500764
MCC on Blind test: 0.14
Accuracy on Blind test: 0.59
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02304816 0.05340958 0.04013824 0.02969098 0.01440239 0.01423526
0.01434231 0.02990079 0.03520942 0.03498721]
mean value: 0.028936433792114257
key: score_time
value: [0.01924634 0.01247501 0.034832 0.01187229 0.01180601 0.01168656
0.01202822 0.02046657 0.02182174 0.02312374]
mean value: 0.017935848236083983
key: test_mcc
value: [0.12909944 0.31814238 0.61608311 0.6778302 0.61608311 0.54812195
0.61608311 0.67916667 0.4184137 0.48954403]
mean value: 0.5108567730282921
key: train_mcc
value: [0.79287737 0.82890983 0.80799639 0.82226276 0.78654305 0.82991071
0.77955111 0.8442081 0.80105406 0.82208713]
mean value: 0.8115400507144125
key: test_accuracy
value: [0.5625 0.65625 0.80645161 0.83870968 0.80645161 0.77419355
0.80645161 0.83870968 0.70967742 0.74193548]
mean value: 0.754133064516129
key: train_accuracy
value: [0.89642857 0.91428571 0.90391459 0.91103203 0.89323843 0.91459075
0.88967972 0.92170819 0.90035587 0.91103203]
mean value: 0.9056265887137773
key: test_fscore
value: [0.61111111 0.62068966 0.82352941 0.84848485 0.82352941 0.78787879
0.78571429 0.83870968 0.68965517 0.75 ]
mean value: 0.7579302361724007
key: train_fscore
value: [0.89605735 0.91304348 0.90252708 0.91166078 0.89208633 0.91240876
0.88888889 0.92028986 0.89928058 0.91103203]
mean value:/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9047275117158565
key: test_precision
value: [0.55 0.69230769 0.77777778 0.82352941 0.77777778 0.76470588
0.84615385 0.8125 0.71428571 0.70588235]
mean value: 0.7464920455361632
key: train_precision
value: [0.89928058 0.92647059 0.91240876 0.9020979 0.89855072 0.93283582
0.89855072 0.94074074 0.91240876 0.91428571]
mean value: 0.913763030931828
key: test_recall
value: [0.6875 0.5625 0.875 0.875 0.875 0.8125
0.73333333 0.86666667 0.66666667 0.8 ]
mean value: 0.7754166666666666
key: train_recall
value: [0.89285714 0.9 0.89285714 0.92142857 0.88571429 0.89285714
0.87943262 0.90070922 0.88652482 0.90780142]
mean value: 0.8960182370820668
key: test_roc_auc
value: [0.5625 0.65625 0.80416667 0.8375 0.80416667 0.77291667
0.80416667 0.83958333 0.70833333 0.74375 ]
mean value: 0.7533333333333333
key: train_roc_auc
value: [0.89642857 0.91428571 0.90387538 0.9110689 0.89321175 0.91451368
0.88971631 0.92178318 0.90040527 0.91104357]
mean value: 0.9056332320162107
key: test_jcc
value: [0.44 0.45 0.7 0.73684211 0.7 0.65
0.64705882 0.72222222 0.52631579 0.6 ]
mean value: 0.6172438940488476
key: train_jcc
value: [0.81168831 0.84 0.82236842 0.83766234 0.80519481 0.83892617
0.8 0.85234899 0.81699346 0.83660131]
mean value: 0.8261783814625151
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27001762 0.23835921 0.23745036 0.23633218 0.24392223 0.31006145
0.33711934 0.26849365 0.2593081 0.27248955]
mean value: 0.2673553705215454
key: score_time
value: [0.02025056 0.02202392 0.02044964 0.02047157 0.01809716 0.01300192
0.03175521 0.0256021 0.02553105 0.02415633]
mean value: 0.022133946418762207
key: test_mcc
value: [0.12909944 0.37796447 0.61608311 0.74689528 0.48527095 0.54812195
0.6310315 0.63696156 0.42083333 0.42083333]
mean value: 0.5013094945304436
key: train_mcc
value: [0.79287737 0.67942124 0.80799639 0.65949742 0.68682877 0.82991071
0.65890803 0.65840807 0.68688251 0.72283925]
mean value: 0.7183569755574486
key: test_accuracy
value: [0.5625 0.6875 0.80645161 0.87096774 0.74193548 0.77419355
0.80645161 0.80645161 0.70967742 0.70967742]
mean value: 0.7475806451612903
key: train_accuracy
value: [0.89642857 0.83928571 0.90391459 0.82918149 0.84341637 0.91459075
0.82918149 0.82918149 0.84341637 0.86120996]
mean value: 0.8589806812404677
key: test_fscore
value: [0.61111111 0.66666667 0.82352941 0.88235294 0.76470588 0.78787879
0.76923077 0.82352941 0.70967742 0.70967742]
mean value: 0.7548359820655836
key: train_fscore
value: [0.89605735 0.84320557 0.90252708 0.83333333 0.84285714 0.91240876
0.83333333 0.83098592 0.84507042 0.8641115 ]
mean value: 0.8603890403329323
key: test_precision
value: [0.55 0.71428571 0.77777778 0.83333333 0.72222222 0.76470588
0.90909091 0.73684211 0.6875 0.6875 ]
mean value: 0.7383257944326056
key: train_precision
value: [0.89928058 0.82312925 0.91240876 0.81081081 0.84285714 0.93283582
0.81632653 0.82517483 0.83916084 0.84931507]
mean value: 0.8551299624368872
key: test_recall
value: [0.6875 0.625 0.875 0.9375 0.8125 0.8125
0.66666667 0.93333333 0.73333333 0.73333333]
mean value: 0.7816666666666666
key: train_recall
value: [0.89285714 0.86428571 0.89285714 0.85714286 0.84285714 0.89285714
0.85106383 0.83687943 0.85106383 0.87943262]
mean value: 0.86612968591692
key: test_roc_auc
value: [0.5625 0.6875 0.80416667 0.86875 0.73958333 0.77291667
0.80208333 0.81041667 0.71041667 0.71041667]
mean value: 0.746875
key: train_roc_auc
value: [0.89642857 0.83928571 0.90387538 0.82928065 0.84341439 0.91451368
0.82910334 0.829154 0.84338906 0.86114488]
mean value: 0.8589589665653495
key: test_jcc
value: [0.44 0.5 0.7 0.78947368 0.61904762 0.65
0.625 0.7 0.55 0.55 ]
mean value: 0.6123521303258146
key: train_jcc
value: [0.81168831 0.72891566 0.82236842 0.71428571 0.72839506 0.83892617
0.71428571 0.71084337 0.73170732 0.7607362 ]
mean value: 0.7562151947074178
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03315091 0.03327417 0.03551388 0.02638865 0.03421259 0.02820277
0.05422091 0.03478932 0.04252434 0.03702164]
mean value: 0.03592991828918457
key: score_time
value: [0.01228666 0.01443219 0.01238513 0.01229572 0.01432252 0.01221967
0.01476145 0.01246548 0.01230025 0.01673579]
mean value: 0.013420486450195312
key: test_mcc
value: [0.5 0.3086067 0.46291005 0.46291005 0.84615385 0.54494926
0.20645591 0.28022427 0.12179487 0.44230769]
mean value: 0.4176312649662118
key: train_mcc
value: [0.72173913 0.67849178 0.68695652 0.67828651 0.69567848 0.66956522
0.66288973 0.74895052 0.72293853 0.71429643]
mean value: 0.6979792842720367
key: test_accuracy
value: [0.73076923 0.65384615 0.73076923 0.73076923 0.92307692 0.76923077
0.6 0.64 0.56 0.72 ]
mean value: 0.7058461538461538
key: train_accuracy
value: [0.86086957 0.83913043 0.84347826 0.83913043 0.84782609 0.83478261
0.83116883 0.87445887 0.86147186 0.85714286]
mean value: 0.8489459815546773
key: test_fscore
value: [0.77419355 0.64 0.72 0.74074074 0.92307692 0.78571429
0.61538462 0.57142857 0.56 0.72 ]
mean value: 0.7050538684732233
key: train_fscore
value: [0.86086957 0.83700441 0.84347826 0.83982684 0.84848485 0.83478261
0.83544304 0.87445887 0.86086957 0.8558952 ]
mean value: 0.849111320253814
key: test_precision
value: [0.66666667 0.66666667 0.75 0.71428571 0.92307692 0.73333333
0.57142857 0.66666667 0.58333333 0.75 ]
mean value: 0.7025457875457876
key: train_precision
value: [0.86086957 0.84821429 0.84347826 0.8362069 0.84482759 0.83478261
0.81818182 0.87826087 0.86086957 0.85964912]
mean value: 0.848534057902696
key: test_recall
value: [0.92307692 0.61538462 0.69230769 0.76923077 0.92307692 0.84615385
0.66666667 0.5 0.53846154 0.69230769]
mean value: 0.7166666666666667
key: train_recall
value: [0.86086957 0.82608696 0.84347826 0.84347826 0.85217391 0.83478261
0.85344828 0.87068966 0.86086957 0.85217391]
mean value: 0.8498050974512743
key: test_roc_auc
value: [0.73076923 0.65384615 0.73076923 0.73076923 0.92307692 0.76923077
0.6025641 0.63461538 0.56089744 0.72115385]
mean value: 0.7057692307692307
key: train_roc_auc
value: [0.86086957 0.83913043 0.84347826 0.83913043 0.84782609 0.83478261
0.83107196 0.87447526 0.86146927 0.85712144]
mean value: 0.8489355322338831
key: test_jcc
value: [0.63157895 0.47058824 0.5625 0.58823529 0.85714286 0.64705882
0.44444444 0.4 0.38888889 0.5625 ]
mean value: 0.5552937490785788
key: train_jcc
value: [0.75572519 0.71969697 0.72932331 0.7238806 0.73684211 0.71641791
0.7173913 0.77692308 0.75572519 0.7480916 ]
mean value: 0.7380017256697218
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.91960263 0.77476716 0.93569636 0.74241829 0.72241664 0.87651324
0.72533584 0.79127908 0.79439926 0.76359153]
mean value: 0.8046020030975342
key: score_time
value: [0.01186204 0.01184845 0.01432681 0.01440716 0.01184154 0.01197195
0.01194453 0.0124135 0.01192331 0.01433039]
mean value: 0.012686967849731445
key: test_mcc
value: [0.26013299 0.3086067 0.46291005 0.46291005 0.84615385 0.5
0.20645591 0.28022427 0.19611614 0.44230769]
mean value: 0.39658176429496
key: train_mcc
value: [0.57445626 0.62611063 0.80027235 0.76524632 0.63487863 0.47942232
0.66288973 0.80157649 0.55957724 0.78383447]
mean value: 0.6688264435025673
key: test_accuracy
value: [0.61538462 0.65384615 0.73076923 0.73076923 0.92307692 0.73076923
0.6 0.64 0.6 0.72 ]
mean value: 0.6944615384615385
key: train_accuracy
value: [0.78695652 0.81304348 0.9 0.8826087 0.8173913 0.73913043
0.83116883 0.9004329 0.77922078 0.89177489]
mean value: 0.8341727837380012
key: test_fscore
value: [0.6875 0.64 0.72 0.74074074 0.92307692 0.77419355
0.61538462 0.57142857 0.64285714 0.72 ]
mean value: 0.7035181541875091
key: train_fscore
value: [0.79148936 0.81385281 0.90128755 0.88209607 0.81896552 0.74789916
0.83544304 0.90295359 0.78481013 0.89270386]
mean value: 0.8371501089693046
key: test_precision
value: [0.57894737 0.66666667 0.75 0.71428571 0.92307692 0.66666667
0.57142857 0.66666667 0.6 0.75 ]
mean value: 0.6887738577212261
key: train_precision
value: [0.775 0.81034483 0.88983051 0.88596491 0.81196581 0.72357724
0.81818182 0.88429752 0.76229508 0.88135593]
mean value: 0.8242813649093232
key: test_recall
value: [0.84615385 0.61538462 0.69230769 0.76923077 0.92307692 0.92307692
0.66666667 0.5 0.69230769 0.69230769]
mean value: 0.7320512820512821
key: train_recall
value: [0.80869565 0.8173913 0.91304348 0.87826087 0.82608696 0.77391304
0.85344828 0.92241379 0.80869565 0.90434783]
mean value: 0.8506296851574213
key: test_roc_auc
value: [0.61538462 0.65384615 0.73076923 0.73076923 0.92307692 0.73076923
0.6025641 0.63461538 0.59615385 0.72115385]
mean value: 0.6939102564102564
key: train_roc_auc
value: [0.78695652 0.81304348 0.9 0.8826087 0.8173913 0.73913043
0.83107196 0.90033733 0.77934783 0.89182909]
mean value: 0.8341716641679161
key: test_jcc
value: [0.52380952 0.47058824 0.5625 0.58823529 0.85714286 0.63157895
0.44444444 0.4 0.47368421 0.5625 ]
mean value: 0.5514483512703326
key: train_jcc
value: [0.65492958 0.68613139 0.8203125 0.7890625 0.69343066 0.59731544
0.7173913 0.82307692 0.64583333 0.80620155]
mean value: 0.7233685168647699
MCC on Blind test: 0.43
Accuracy on Blind test: 0.72
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.02167606 0.00983524 0.01029277 0.01009822 0.00993466 0.00891805
0.00893712 0.0100255 0.01011395 0.01003551]
mean value: 0.010986709594726562
key: score_time
value: [0.01178932 0.0098989 0.0096755 0.00934529 0.00883579 0.00862885
0.0086813 0.0093863 0.00939846 0.00942039]
mean value: 0.009506011009216308
key: test_mcc
value: [0.43355498 0.23354968 0.43355498 0.23354968 0.63245553 0.56591646
0.35897436 0.27742513 0.1990977 0.5423696 ]
mean value: 0.3910448113136374
key: train_mcc
value: [0.46517657 0.48902898 0.45643546 0.46727535 0.46517657 0.45232785
0.48477646 0.51262311 0.4779845 0.42722854]
mean value: 0.46980333989342027
key: test_accuracy
value: [0.69230769 0.61538462 0.69230769 0.61538462 0.80769231 0.76923077
0.68 0.64 0.6 0.76 ]
mean value: 0.6872307692307692
key: train_accuracy
value: [0.72608696 0.73913043 0.7173913 0.72608696 0.72608696 0.72608696
0.73593074 0.74891775 0.73160173 0.68831169]
mean value: 0.7265631469979296
key: test_fscore
value: [0.75 0.64285714 0.75 0.64285714 0.82758621 0.8
0.66666667 0.60869565 0.66666667 0.8 ]
mean value: 0.7155329478118084
key: train_fscore
value: [0.75486381 0.76377953 0.75471698 0.75675676 0.75486381 0.72246696
0.76447876 0.77692308 0.75968992 0.74647887]
mean value: 0.7555018489381352
key: test_precision
value: [0.63157895 0.6 0.63157895 0.6 0.75 0.70588235
0.66666667 0.63636364 0.58823529 0.70588235]
mean value: 0.6516188197767145
key: train_precision
value: [0.68309859 0.69784173 0.66666667 0.68055556 0.68309859 0.73214286
0.69230769 0.70138889 0.68531469 0.62721893]
mean value: 0.6849634190504885
key: test_recall
value: [0.92307692 0.69230769 0.92307692 0.69230769 0.92307692 0.92307692
0.66666667 0.58333333 0.76923077 0.92307692]
mean value: 0.801923076923077
key: train_recall
value: [0.84347826 0.84347826 0.86956522 0.85217391 0.84347826 0.71304348
0.85344828 0.87068966 0.85217391 0.92173913]
mean value: 0.8463268365817092
key: test_roc_auc
value: [0.69230769 0.61538462 0.69230769 0.61538462 0.80769231 0.76923077
0.67948718 0.63782051 0.59294872 0.75320513]
mean value: 0.6855769230769231
key: train_roc_auc
value: [0.72608696 0.73913043 0.7173913 0.72608696 0.72608696 0.72608696
0.73541979 0.74838831 0.73212144 0.68931784]
mean value: 0.7266116941529235
key: test_jcc
value: [0.6 0.47368421 0.6 0.47368421 0.70588235 0.66666667
0.5 0.4375 0.5 0.66666667]
mean value: 0.5624084107327141
key: train_jcc
value: [0.60625 0.61783439 0.60606061 0.60869565 0.60625 0.56551724
0.61875 0.63522013 0.6125 0.59550562]
mean value: 0.607258363828198
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00918698 0.00909734 0.00939155 0.00925922 0.00944185 0.0103898
0.00985146 0.01025653 0.01025677 0.00990176]
mean value: 0.009703326225280761
key: score_time
value: [0.00874496 0.00913382 0.00859189 0.00863647 0.00933886 0.00939512
0.00858855 0.00945616 0.00945187 0.00943851]
mean value: 0.009077620506286622
key: test_mcc
value: [0.24253563 0.23354968 0.40422604 0.6172134 0.9258201 0.60697698
0.22017621 0.19611614 0.19871795 0.28022427]
mean value: 0.3925556392796845
key: train_mcc
value: [0.49753679 0.56736651 0.51428939 0.50589946 0.46149812 0.47100984
0.53514724 0.52692012 0.53245877 0.48251499]
mean value: 0.5094641245565984
key: test_accuracy
value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077
0.6 0.6 0.6 0.64 ]
mean value: 0.6901538461538461
key: train_accuracy
value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261
0.76623377 0.76190476 0.76623377 0.74025974]
mean value: 0.7538979860718991
key: test_fscore
value: [0.66666667 0.58333333 0.73333333 0.81481481 0.96296296 0.8125
0.64285714 0.54545455 0.61538462 0.68965517]
mean value: 0.7066962587221208
key: train_fscore
value: [0.75833333 0.79166667 0.76470588 0.76150628 0.73728814 0.74476987
0.77868852 0.7755102 0.76521739 0.75 ]
mean value: 0.7627686288549921
key: test_precision
value: [0.58823529 0.63636364 0.64705882 0.78571429 0.92857143 0.68421053
0.5625 0.6 0.61538462 0.625 ]
mean value: 0.6673038609996814
key: train_precision
value: [0.728 0.76 0.7398374 0.73387097 0.71900826 0.71774194
0.7421875 0.73643411 0.76521739 0.72 ]
mean value: 0.736229756589408
key: test_recall
value: [0.76923077 0.53846154 0.84615385 0.84615385 1. 1.
0.75 0.5 0.61538462 0.76923077]
mean value: 0.7634615384615384
key: train_recall
value: [0.79130435 0.82608696 0.79130435 0.79130435 0.75652174 0.77391304
0.81896552 0.81896552 0.76521739 0.7826087 ]
mean value: 0.7916191904047976
key: test_roc_auc
value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077
0.60576923 0.59615385 0.59935897 0.63461538]
mean value: 0.6897435897435897
key: train_roc_auc
value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261
0.7660045 0.76165667 0.76622939 0.74044228]
mean value: 0.7538680659670165
key: test_jcc
value: [0.5 0.41176471 0.57894737 0.6875 0.92857143 0.68421053
0.47368421 0.375 0.44444444 0.52631579]
mean value: 0.5610438473635068
key: train_jcc
value: [0.61073826 0.65517241 0.61904762 0.61486486 0.58389262 0.59333333
0.63758389 0.63333333 0.61971831 0.6 ]
mean value: 0.616768463933208
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00914311 0.00849319 0.00899696 0.00962973 0.00991821 0.00909781
0.00972629 0.00986457 0.00982285 0.00978184]
mean value: 0.009447455406188965
key: score_time
value: [0.01053095 0.00990725 0.01061821 0.0104363 0.01081729 0.0106883
0.01077843 0.0106926 0.01073742 0.01077318]
mean value: 0.010597991943359374
key: test_mcc
value: [-0.23354968 0.07692308 0.46291005 0.3086067 0.46291005 0.23076923
0.28205128 -0.2941742 0.11613145 0.35954625]
mean value: 0.1772124200408744
key: train_mcc
value: [0.53915082 0.54990908 0.51585963 0.56521739 0.44354534 0.49595227
0.51623761 0.55003766 0.48062074 0.53284841]
mean value: 0.5189378957673179
key: test_accuracy
value: [0.38461538 0.53846154 0.73076923 0.65384615 0.73076923 0.61538462
0.64 0.36 0.56 0.68 ]
mean value: 0.5893846153846154
key: train_accuracy
value: [0.76956522 0.77391304 0.75652174 0.7826087 0.72173913 0.74782609
0.75757576 0.77489177 0.74025974 0.76623377]
mean value: 0.7591134952004517
key: test_fscore
value: [0.42857143 0.53846154 0.72 0.64 0.74074074 0.61538462
0.64 0.27272727 0.59259259 0.71428571]
mean value: 0.5902763902763902
key: train_fscore
value: [0.77056277 0.78333333 0.76859504 0.7826087 0.71929825 0.75213675
0.76666667 0.77966102 0.74137931 0.76923077]
mean value: 0.7633472601812795
key: test_precision
value: [0.4 0.53846154 0.75 0.66666667 0.71428571 0.61538462
0.61538462 0.3 0.57142857 0.66666667]
mean value: 0.5838278388278388
key: train_precision
value: [0.76724138 0.752 0.73228346 0.7826087 0.72566372 0.7394958
0.74193548 0.76666667 0.73504274 0.75630252]
mean value: 0.7499240461251708
key: test_recall
value: [0.46153846 0.53846154 0.69230769 0.61538462 0.76923077 0.61538462
0.66666667 0.25 0.61538462 0.76923077]
mean value: 0.5993589743589743
key: train_recall
value: [0.77391304 0.8173913 0.80869565 0.7826087 0.71304348 0.76521739
0.79310345 0.79310345 0.74782609 0.7826087 ]
mean value: 0.7777511244377812
key: test_roc_auc
value: [0.38461538 0.53846154 0.73076923 0.65384615 0.73076923 0.61538462
0.64102564 0.35576923 0.55769231 0.67628205]
mean value: 0.5884615384615385
key: train_roc_auc
value: [0.76956522 0.77391304 0.75652174 0.7826087 0.72173913 0.74782609
0.75742129 0.77481259 0.74029235 0.76630435]
mean value: 0.7591004497751125
key: test_jcc
value: [0.27272727 0.36842105 0.5625 0.47058824 0.58823529 0.44444444
0.47058824 0.15789474 0.42105263 0.55555556]
mean value: 0.43120074584857865
key: train_jcc
value: [0.62676056 0.64383562 0.62416107 0.64285714 0.56164384 0.60273973
0.62162162 0.63888889 0.5890411 0.625 ]
mean value: 0.6176549564546041
MCC on Blind test: 0.18
Accuracy on Blind test: 0.59
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01466942 0.01401472 0.01425433 0.01414585 0.01411271 0.0141561
0.01212788 0.01342058 0.01408696 0.01221085]
mean value: 0.013719940185546875
key: score_time
value: [0.01055813 0.0103178 0.01053596 0.01041269 0.01040769 0.00974035
0.01015902 0.00984025 0.00954103 0.00947142]
mean value: 0.010098433494567871
key: test_mcc
value: [0.26013299 0.3086067 0.46291005 0.6172134 0.9258201 0.47434165
0.13074409 0.11342411 0.11613145 0.43871881]
mean value: 0.3848043345500712
key: train_mcc
value: [0.67849178 0.64369733 0.65857921 0.66956522 0.63516695 0.63632416
0.66423848 0.73287422 0.68182751 0.6456866 ]
mean value: 0.6646451453192566
key: test_accuracy
value: [0.61538462 0.65384615 0.73076923 0.80769231 0.96153846 0.73076923
0.56 0.56 0.56 0.72 ]
mean value: 0.6900000000000001
key: train_accuracy
value: [0.83913043 0.82173913 0.82608696 0.83478261 0.8173913 0.8173913
0.83116883 0.86580087 0.83982684 0.82251082]
mean value: 0.8315829098437795
key: test_fscore
value: [0.6875 0.64 0.74074074 0.81481481 0.96296296 0.75862069
0.59259259 0.47619048 0.59259259 0.74074074]
mean value: 0.7006755610290093
key: train_fscore
value: [0.84120172 0.82403433 0.83739837 0.83478261 0.82051282 0.82352941
0.83817427 0.87029289 0.84518828 0.82553191]
mean value: 0.8360646626759719
key: test_precision
value: [0.57894737 0.66666667 0.71428571 0.78571429 0.92857143 0.6875
0.53333333 0.55555556 0.57142857 0.71428571]
mean value: 0.6736288638262322
key: train_precision
value: [0.83050847 0.81355932 0.78625954 0.83478261 0.80672269 0.79674797
0.808 0.84552846 0.81451613 0.80833333]
mean value: 0.8144958521496004
key: test_recall
value: [0.84615385 0.61538462 0.76923077 0.84615385 1. 0.84615385
0.66666667 0.41666667 0.61538462 0.76923077]
mean value: 0.7391025641025641
key: train_recall
value: [0.85217391 0.83478261 0.89565217 0.83478261 0.83478261 0.85217391
0.87068966 0.89655172 0.87826087 0.84347826]
mean value: 0.8593328335832084
key: test_roc_auc
value: [0.61538462 0.65384615 0.73076923 0.80769231 0.96153846 0.73076923
0.56410256 0.55448718 0.55769231 0.71794872]
mean value: 0.6894230769230769
key: train_roc_auc
value: [0.83913043 0.82173913 0.82608696 0.83478261 0.8173913 0.8173913
0.830997 0.86566717 0.8399925 0.8226012 ]
mean value: 0.8315779610194903
key: test_jcc
value: [0.52380952 0.47058824 0.58823529 0.6875 0.92857143 0.61111111
0.42105263 0.3125 0.42105263 0.58823529]
mean value: 0.555265615017937
key: train_jcc
value: [0.72592593 0.70072993 0.72027972 0.71641791 0.69565217 0.7
0.72142857 0.77037037 0.73188406 0.70289855]
mean value: 0.7185587208068345
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.79133201 1.04224086 0.68274426 1.19078088 1.04643345 1.26272154
1.36647201 1.08281016 1.21882343 0.64180708]
mean value: 1.0326165676116943
key: score_time
value: [0.01203728 0.01447105 0.01207662 0.01443744 0.01444626 0.01231718
0.01682568 0.01575065 0.01480103 0.01208425]
mean value: 0.013924741744995117
key: test_mcc
value: [0.31622777 0.3086067 0.47434165 0.31622777 0.77151675 0.54494926
0.2941742 0.28022427 0.35897436 0.27742513]
mean value: 0.3942667851121904
key: train_mcc
value: [0.88725848 0.99134183 0.91428873 0.98275733 0.95684738 0.96521739
0.96550951 0.99137867 0.95703684 0.83716866]
mean value: 0.9448804798337176
key: test_accuracy
value: [0.65384615 0.65384615 0.73076923 0.65384615 0.88461538 0.76923077
0.64 0.64 0.68 0.64 ]
mean value: 0.6946153846153846
key: train_accuracy
value: [0.94347826 0.99565217 0.95652174 0.99130435 0.97826087 0.9826087
0.98268398 0.995671 0.97835498 0.91341991]
mean value: 0.9717955957086392
key: test_fscore
value: [0.68965517 0.64 0.69565217 0.68965517 0.88888889 0.78571429
0.66666667 0.57142857 0.69230769 0.66666667]
mean value: 0.6986635290413401
key: train_fscore
value: [0.94273128 0.99563319 0.95535714 0.99137931 0.97854077 0.9826087
0.98290598 0.99570815 0.97854077 0.91935484]
mean value: 0.9722760135346585
key: test_precision
value: [0.625 0.66666667 0.8 0.625 0.85714286 0.73333333
0.6 0.66666667 0.69230769 0.64285714]
mean value: 0.6908974358974359
key: train_precision
value: [0.95535714 1. 0.98165138 0.98290598 0.96610169 0.9826087
0.97457627 0.99145299 0.96610169 0.85714286]
mean value: 0.9657898707174887
key: test_recall
value: [0.76923077 0.61538462 0.61538462 0.76923077 0.92307692 0.84615385
0.75 0.5 0.69230769 0.69230769]
mean value: 0.7173076923076923
key: train_recall
value: [0.93043478 0.99130435 0.93043478 1. 0.99130435 0.9826087
0.99137931 1. 0.99130435 0.99130435]
mean value: 0.9800074962518741
key: test_roc_auc
value: [0.65384615 0.65384615 0.73076923 0.65384615 0.88461538 0.76923077
0.64423077 0.63461538 0.67948718 0.63782051]
mean value: 0.6942307692307692
key: train_roc_auc
value: [0.94347826 0.99565217 0.95652174 0.99130435 0.97826087 0.9826087
0.98264618 0.99565217 0.97841079 0.91375562]
mean value: 0.9718290854572714
key: test_jcc
value: [0.52631579 0.47058824 0.53333333 0.52631579 0.8 0.64705882
0.5 0.4 0.52941176 0.5 ]
mean value: 0.5433023735810114
key: train_jcc
value: [0.89166667 0.99130435 0.91452991 0.98290598 0.95798319 0.96581197
0.96638655 0.99145299 0.95798319 0.85074627]
mean value: 0.9470771079026795
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02313399 0.01868463 0.01814771 0.01593852 0.01778412 0.01742077
0.01628041 0.01744723 0.01802492 0.01848912]
mean value: 0.01813514232635498
key: score_time
value: [0.01164889 0.00898504 0.00868201 0.00876403 0.00867915 0.0086062
0.00851274 0.00851393 0.00859714 0.00859332]
mean value: 0.008958244323730468
key: test_mcc
value: [0.77151675 0.70064905 0.31622777 0.6172134 0.69230769 0.53846154
0.11613145 0.36774959 0.6025641 0.60001249]
mean value: 0.5322833824348638
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88461538 0.84615385 0.65384615 0.80769231 0.84615385 0.76923077
0.56 0.68 0.8 0.8 ]
mean value: 0.7647692307692308
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.85714286 0.60869565 0.81481481 0.84615385 0.76923077
0.52173913 0.69230769 0.8 0.81481481]
mean value: 0.7613788465962379
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.8 0.7 0.78571429 0.84615385 0.76923077
0.54545455 0.64285714 0.83333333 0.78571429]
mean value: 0.7565601065601065
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.92307692 0.53846154 0.84615385 0.84615385 0.76923077
0.5 0.75 0.76923077 0.84615385]
mean value: 0.7711538461538462
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88461538 0.84615385 0.65384615 0.80769231 0.84615385 0.76923077
0.55769231 0.68269231 0.80128205 0.79807692]
mean value: 0.7647435897435897
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.75 0.4375 0.6875 0.73333333 0.625
0.35294118 0.52941176 0.66666667 0.6875 ]
mean value: 0.626985294117647
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09956813 0.10019636 0.0992651 0.09971094 0.09992576 0.09951663
0.0993073 0.09910083 0.10106516 0.09976125]
mean value: 0.09974174499511719
key: score_time
value: [0.01748419 0.01745272 0.01728463 0.01737189 0.01731944 0.01762104
0.01746798 0.01737499 0.01740623 0.01749277]
mean value: 0.017427587509155275
key: test_mcc
value: [0.23354968 0.23076923 0.23354968 0.15430335 0.85634884 0.53846154
0.35954625 0.02746175 0.20645591 0.19871795]
mean value: 0.30391641821947946
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.61538462 0.61538462 0.57692308 0.92307692 0.76923077
0.68 0.52 0.6 0.6 ]
mean value: 0.6515384615384615
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.64285714 0.61538462 0.58333333 0.59259259 0.92857143 0.76923077
0.63636364 0.4 0.58333333 0.61538462]
mean value: 0.6367051467051468
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.61538462 0.63636364 0.57142857 0.86666667 0.76923077
0.7 0.5 0.63636364 0.61538462]
mean value: 0.651082251082251
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.69230769 0.61538462 0.53846154 0.61538462 1. 0.76923077
0.58333333 0.33333333 0.53846154 0.61538462]
mean value: 0.6301282051282051
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61538462 0.61538462 0.61538462 0.57692308 0.92307692 0.76923077
0.67628205 0.51282051 0.6025641 0.59935897]
mean value: 0.6506410256410257
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.47368421 0.44444444 0.41176471 0.42105263 0.86666667 0.625
0.46666667 0.25 0.41176471 0.44444444]
mean value: 0.4815488476092191
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00924373 0.00909591 0.00902104 0.00901556 0.00903559 0.00899434
0.00899005 0.01007128 0.00922346 0.00893664]
mean value: 0.00916275978088379
key: score_time
value: [0.00859094 0.00852585 0.00848031 0.00861931 0.00847697 0.00852418
0.00880098 0.00921869 0.00855017 0.00851154]
mean value: 0.008629894256591797
key: test_mcc
value: [ 0.07784989 0.38924947 0.6172134 -0.31622777 0.15430335 0.3086067
0.11342411 0.28022427 0.19871795 0.27742513]
mean value: 0.21007865056115643
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.53846154 0.69230769 0.80769231 0.34615385 0.57692308 0.65384615
0.56 0.64 0.6 0.64 ]
mean value: 0.6055384615384616
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.57142857 0.66666667 0.81481481 0.26086957 0.59259259 0.64
0.47619048 0.57142857 0.61538462 0.66666667]
mean value: 0.5876042540390367
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.53333333 0.72727273 0.78571429 0.3 0.57142857 0.66666667
0.55555556 0.66666667 0.61538462 0.64285714]
mean value: 0.6064879564879565
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.61538462 0.61538462 0.84615385 0.23076923 0.61538462 0.61538462
0.41666667 0.5 0.61538462 0.69230769]
mean value: 0.5762820512820513
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.53846154 0.69230769 0.80769231 0.34615385 0.57692308 0.65384615
0.55448718 0.63461538 0.59935897 0.63782051]
mean value: 0.6041666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.4 0.5 0.6875 0.15 0.42105263 0.47058824
0.3125 0.4 0.44444444 0.5 ]
mean value: 0.42860853113175096
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.58
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.36223483 1.38157392 1.35260463 1.36770082 1.45512009 1.4243691
1.37369204 1.38668728 1.35794759 1.35540032]
mean value: 1.3817330598831177
key: score_time
value: [0.09519005 0.09159613 0.09452748 0.0909903 0.09830999 0.09202361
0.09135485 0.09582472 0.09233117 0.09036994]
mean value: 0.09325182437896729
key: test_mcc
value: [0.47434165 0.53846154 0.40422604 0.38924947 0.84615385 0.53846154
0.51923077 0.1990977 0.35897436 0.44230769]
mean value: 0.47105046071111284
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73076923 0.76923077 0.69230769 0.69230769 0.92307692 0.76923077
0.76 0.6 0.68 0.72 ]
mean value: 0.7336923076923076
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75862069 0.76923077 0.63636364 0.71428571 0.92307692 0.76923077
0.75 0.5 0.69230769 0.72 ]
mean value: 0.7233116194150677
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6875 0.76923077 0.77777778 0.66666667 0.92307692 0.76923077
0.75 0.625 0.69230769 0.75 ]
mean value: 0.7410790598290599
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84615385 0.76923077 0.53846154 0.76923077 0.92307692 0.76923077
0.75 0.41666667 0.69230769 0.69230769]
mean value: 0.7166666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73076923 0.76923077 0.69230769 0.69230769 0.92307692 0.76923077
0.75961538 0.59294872 0.67948718 0.72115385]
mean value: 0.7330128205128205
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61111111 0.625 0.46666667 0.55555556 0.85714286 0.625
0.6 0.33333333 0.52941176 0.5625 ]
mean value: 0.5765721288515406
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.49
Accuracy on Blind test: 0.75
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87057018 0.89433622 0.93123794 0.86043167 0.92215872 0.94535112
0.91315413 0.94380188 0.93137598 0.8592701 ]
mean value: 0.9071687936782837
key: score_time
value: [0.26281023 0.11465859 0.26945305 0.24121785 0.2097342 0.23504758
0.22764254 0.22707939 0.18495941 0.23176789]
mean value: 0.22043707370758056
key: test_mcc
value: [0.31622777 0.53846154 0.56591646 0.6172134 0.9258201 0.69230769
0.6025641 0.1990977 0.12179487 0.6025641 ]
mean value: 0.518196773243632
key: train_mcc
value: [0.87839372 0.88725848 0.87038828 0.86099978 0.89619446 0.88699006
0.89662441 0.87890832 0.86146927 0.88748126]
mean value: 0.880470802843822
key: test_accuracy
value: [0.65384615 0.76923077 0.76923077 0.80769231 0.96153846 0.84615385
0.8 0.6 0.56 0.8 ]
mean value: 0.7567692307692307
key: train_accuracy
value: [0.93913043 0.94347826 0.93478261 0.93043478 0.94782609 0.94347826
0.94805195 0.93939394 0.93073593 0.94372294]
mean value: 0.9401035196687371
key: test_fscore
value: [0.68965517 0.76923077 0.72727273 0.81481481 0.96296296 0.84615385
0.8 0.5 0.56 0.8 ]
mean value: 0.7470090292848914
key: train_fscore
value: [0.93965517 0.94420601 0.93617021 0.92982456 0.94871795 0.94372294
0.94915254 0.94017094 0.93043478 0.94372294]
mean value: 0.9405778056483304
key: test_precision
value: [0.625 0.76923077 0.88888889 0.78571429 0.92857143 0.84615385
0.76923077 0.625 0.58333333 0.83333333]
mean value: 0.7654456654456655
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
train_precision
value: [0.93162393 0.93220339 0.91666667 0.9380531 0.93277311 0.93965517
0.93333333 0.93220339 0.93043478 0.93965517]
mean value: 0.9326602045310061
key: test_recall
value: [0.76923077 0.76923077 0.61538462 0.84615385 1. 0.84615385
0.83333333 0.41666667 0.53846154 0.76923077]
mean value: 0.7403846153846154
key: train_recall
value: [0.94782609 0.95652174 0.95652174 0.92173913 0.96521739 0.94782609
0.96551724 0.94827586 0.93043478 0.94782609]
mean value: 0.9487706146926537
key: test_roc_auc
value: [0.65384615 0.76923077 0.76923077 0.80769231 0.96153846 0.84615385
0.80128205 0.59294872 0.56089744 0.80128205]
mean value: 0.7564102564102564
key: train_roc_auc
value: [0.93913043 0.94347826 0.93478261 0.93043478 0.94782609 0.94347826
0.94797601 0.93935532 0.93073463 0.94374063]
mean value: 0.9400937031484258
key: test_jcc
value: [0.52631579 0.625 0.57142857 0.6875 0.92857143 0.73333333
0.66666667 0.33333333 0.38888889 0.66666667]
mean value: 0.6127704678362573
key: train_jcc
value: [0.88617886 0.89430894 0.88 0.86885246 0.90243902 0.89344262
0.90322581 0.88709677 0.8699187 0.89344262]
mean value: 0.8878905814018478
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00985193 0.00925279 0.0090847 0.00928497 0.00924087 0.00906801
0.00925469 0.00909114 0.00927258 0.00920963]
mean value: 0.009261131286621094
key: score_time
value: [0.00861406 0.00864077 0.00928879 0.00863123 0.00860333 0.0085845
0.00866389 0.00860381 0.00862241 0.00861168]
mean value: 0.008686447143554687
key: test_mcc
value: [0.24253563 0.23354968 0.40422604 0.6172134 0.9258201 0.60697698
0.22017621 0.19611614 0.19871795 0.28022427]
mean value: 0.3925556392796845
key: train_mcc
value: [0.49753679 0.56736651 0.51428939 0.50589946 0.46149812 0.47100984
0.53514724 0.52692012 0.53245877 0.48251499]
mean value: 0.5094641245565984
key: test_accuracy
value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077
0.6 0.6 0.6 0.64 ]
mean value: 0.6901538461538461
key: train_accuracy
value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261
0.76623377 0.76190476 0.76623377 0.74025974]
mean value: 0.7538979860718991
key: test_fscore
value: [0.66666667 0.58333333 0.73333333 0.81481481 0.96296296 0.8125
0.64285714 0.54545455 0.61538462 0.68965517]
mean value: 0.7066962587221208
key: train_fscore
value: [0.75833333 0.79166667 0.76470588 0.76150628 0.73728814 0.74476987
0.77868852 0.7755102 0.76521739 0.75 ]
mean value: 0.7627686288549921
key: test_precision
value: [0.58823529 0.63636364 0.64705882 0.78571429 0.92857143 0.68421053
0.5625 0.6 0.61538462 0.625 ]
mean value: 0.6673038609996814
key: train_precision
value: [0.728 0.76 0.7398374 0.73387097 0.71900826 0.71774194
0.7421875 0.73643411 0.76521739 0.72 ]
mean value: 0.736229756589408
key: test_recall
value: [0.76923077 0.53846154 0.84615385 0.84615385 1. 1.
0.75 0.5 0.61538462 0.76923077]
mean value: 0.7634615384615384
key: train_recall
value: [0.79130435 0.82608696 0.79130435 0.79130435 0.75652174 0.77391304
0.81896552 0.81896552 0.76521739 0.7826087 ]
mean value: 0.7916191904047976
key: test_roc_auc
value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077
0.60576923 0.59615385 0.59935897 0.63461538]
mean value: 0.6897435897435897
key: train_roc_auc
value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261
0.7660045 0.76165667 0.76622939 0.74044228]
mean value: 0.7538680659670165
key: test_jcc
value: [0.5 0.41176471 0.57894737 0.6875 0.92857143 0.68421053
0.47368421 0.375 0.44444444 0.52631579]
mean value: 0.5610438473635068
key: train_jcc
value: [0.61073826 0.65517241 0.61904762 0.61486486 0.58389262 0.59333333
0.63758389 0.63333333 0.61971831 0.6 ]
mean value: 0.616768463933208
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08827448 0.0708189 0.07702231 0.22159338 0.06050777 0.06993937
0.06500721 0.08993006 0.05968285 0.05984974]
mean value: 0.08626260757446289
key: score_time
value: [0.011204 0.01124358 0.0114789 0.01113033 0.01171565 0.01125646
0.01110744 0.01070189 0.01019716 0.01020026]
mean value: 0.011023569107055663
key: test_mcc
value: [0.77151675 0.69230769 0.5 0.69230769 0.77151675 0.6172134
0.37073365 0.43871881 0.52904327 0.52904327]
mean value: 0.5912401278510183
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88461538 0.84615385 0.73076923 0.84615385 0.88461538 0.80769231
0.68 0.72 0.76 0.76 ]
mean value: 0.792
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.84615385 0.66666667 0.84615385 0.88888889 0.81481481
0.6 0.69565217 0.75 0.75 ]
mean value: 0.7747219125479995
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.84615385 0.875 0.84615385 0.85714286 0.78571429
0.75 0.72727273 0.81818182 0.81818182]
mean value: 0.8180944055944056
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.84615385 0.53846154 0.84615385 0.92307692 0.84615385
0.5 0.66666667 0.69230769 0.69230769]
mean value: 0.7474358974358974
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88461538 0.84615385 0.73076923 0.84615385 0.88461538 0.80769231
0.67307692 0.71794872 0.76282051 0.76282051]
mean value: 0.7916666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.73333333 0.5 0.73333333 0.8 0.6875
0.42857143 0.53333333 0.6 0.6 ]
mean value: 0.6416071428571428
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0325067 0.03058124 0.03802133 0.05640101 0.05467892 0.02662373
0.02773499 0.04848194 0.04608655 0.05499625]
mean value: 0.04161126613616943
key: score_time
value: [0.01179028 0.01180315 0.02161908 0.0220561 0.01192045 0.01189947
0.01557565 0.01187563 0.04020929 0.02342272]
mean value: 0.018217182159423827
key: test_mcc
value: [ 0.6172134 0.15430335 0.54494926 0. 0.53846154 0.16666667
0.35897436 -0.13074409 0.35897436 0.35897436]
mean value: 0.2967773202682685
key: train_mcc
value: [0.84360585 0.88725848 0.82621191 0.85220613 0.85246403 0.89619446
0.87878561 0.95674339 0.84415292 0.85283755]
mean value: 0.8690460318996313
key: test_accuracy
value: [0.80769231 0.57692308 0.76923077 0.5 0.76923077 0.57692308
0.68 0.44 0.68 0.68 ]
mean value: 0.648
key: train_accuracy
value: [0.92173913 0.94347826 0.91304348 0.92608696 0.92608696 0.94782609
0.93939394 0.97835498 0.92207792 0.92640693]
mean value: 0.9344494635798983
key: test_fscore
value: [0.8 0.56 0.78571429 0.48 0.76923077 0.64516129
0.66666667 0.36363636 0.69230769 0.69230769]
mean value: 0.645502476018605
key: train_fscore
value: [0.92105263 0.94420601 0.9122807 0.92640693 0.92703863 0.94871795
0.93965517 0.97854077 0.92173913 0.92576419]
mean value: 0.9345402111171843
key: test_precision
value: [0.83333333 0.58333333 0.73333333 0.5 0.76923077 0.55555556
0.66666667 0.4 0.69230769 0.69230769]
mean value: 0.6426068376068376
key: train_precision
value: [0.92920354 0.93220339 0.92035398 0.92241379 0.91525424 0.93277311
0.93965517 0.97435897 0.92173913 0.92982456]
mean value: 0.9317779890200742
key: test_recall
value: [0.76923077 0.53846154 0.84615385 0.46153846 0.76923077 0.76923077
0.66666667 0.33333333 0.69230769 0.69230769]
mean value: 0.6538461538461539
key: train_recall
value: [0.91304348 0.95652174 0.90434783 0.93043478 0.93913043 0.96521739
0.93965517 0.98275862 0.92173913 0.92173913]
mean value: 0.9374587706146926
key: test_roc_auc
value: [0.80769231 0.57692308 0.76923077 0.5 0.76923077 0.57692308
0.67948718 0.43589744 0.67948718 0.67948718]
mean value: 0.6474358974358975
key: train_roc_auc
value: [0.92173913 0.94347826 0.91304348 0.92608696 0.92608696 0.94782609
0.9393928 0.97833583 0.92207646 0.92638681]
mean value: 0.9344452773613193
key: test_jcc
value: [0.66666667 0.38888889 0.64705882 0.31578947 0.625 0.47619048
0.5 0.22222222 0.52941176 0.52941176]
mean value: 0.4900640080593641
key: train_jcc
value: [0.85365854 0.89430894 0.83870968 0.86290323 0.864 0.90243902
0.88617886 0.95798319 0.85483871 0.86178862]
mean value: 0.8776808789920374
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.0127399 0.00920939 0.00912523 0.00896168 0.00895476 0.00874591
0.00885916 0.00897789 0.00905085 0.00910902]
mean value: 0.00937337875366211
key: score_time
value: [0.01170278 0.00882316 0.00865865 0.00838375 0.0084579 0.00852489
0.00889111 0.00839472 0.0085063 0.00878024]
mean value: 0.008912348747253418
key: test_mcc
value: [0.33333333 0.38461538 0.54494926 0.3086067 0.70064905 0.56591646
0.44702443 0.37073365 0.35954625 0.43871881]
mean value: 0.44540933216111894
key: train_mcc
value: [0.46262193 0.46262193 0.4451645 0.45301392 0.45425676 0.44455524
0.45638654 0.47461305 0.46446918 0.48335689]
mean value: 0.4601059925977649
key: test_accuracy
value: [0.65384615 0.69230769 0.76923077 0.65384615 0.84615385 0.76923077
0.72 0.68 0.68 0.72 ]
mean value: 0.7184615384615385
key: train_accuracy
value: [0.73043478 0.73043478 0.72173913 0.72608696 0.72608696 0.72173913
0.72727273 0.73593074 0.73160173 0.74025974]
mean value: 0.7291586674195369
key: test_fscore
value: [0.70967742 0.69230769 0.78571429 0.64 0.85714286 0.8
0.66666667 0.6 0.71428571 0.74074074]
mean value: 0.7206535376212796
key: train_fscore
value: [0.74166667 0.74166667 0.73333333 0.73417722 0.73858921 0.73109244
0.74074074 0.75102041 0.7394958 0.75206612]
mean value: 0.7403848593375401
key: test_precision
value: [0.61111111 0.69230769 0.73333333 0.66666667 0.8 0.70588235
0.77777778 0.75 0.66666667 0.71428571]
mean value: 0.7118031315090139
key: train_precision
value: [0.712 0.712 0.704 0.71311475 0.70634921 0.70731707
0.70866142 0.71317829 0.71544715 0.71653543]
mean value: 0.7108603333057187
key: test_recall
value: [0.84615385 0.69230769 0.84615385 0.61538462 0.92307692 0.92307692
0.58333333 0.5 0.76923077 0.76923077]
mean value: 0.7467948717948718
key: train_recall
value: [0.77391304 0.77391304 0.76521739 0.75652174 0.77391304 0.75652174
0.77586207 0.79310345 0.76521739 0.79130435]
mean value: 0.7725487256371814
key: test_roc_auc
value: [0.65384615 0.69230769 0.76923077 0.65384615 0.84615385 0.76923077
0.71474359 0.67307692 0.67628205 0.71794872]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.73043478 0.73043478 0.72173913 0.72608696 0.72608696 0.72173913
0.72706147 0.73568216 0.73174663 0.74047976]
mean value: 0.729149175412294
key: test_jcc
value: [0.55 0.52941176 0.64705882 0.47058824 0.75 0.66666667
0.5 0.42857143 0.55555556 0.58823529]
mean value: 0.568608776844071
key: train_jcc
value: [0.58940397 0.58940397 0.57894737 0.58 0.58552632 0.57615894
0.58823529 0.60130719 0.58666667 0.60264901]
mean value: 0.5878298728577058
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01029062 0.01638079 0.01510525 0.01574659 0.01706219 0.01559711
0.01537395 0.01762438 0.01753545 0.01649976]
mean value: 0.015721607208251952
key: score_time
value: [0.00846767 0.01144028 0.01150489 0.01210713 0.01146317 0.01151443
0.01178288 0.01176548 0.01183534 0.01185989]
mean value: 0.011374115943908691
key: test_mcc
value: [0.48795004 0.3086067 0.16666667 0.42640143 0.66666667 0.72760688
0.21245915 0.28022427 0.36774959 0.1990977 ]
mean value: 0.38434290828396633
key: train_mcc
value: [0.60151666 0.70534562 0.55386282 0.52704628 0.47265659 0.5203059
0.3545926 0.82730706 0.74633927 0.66731915]
mean value: 0.5976291936314948
key: test_accuracy
value: [0.69230769 0.65384615 0.57692308 0.65384615 0.80769231 0.84615385
0.56 0.64 0.68 0.6 ]
mean value: 0.6710769230769231
key: train_accuracy
value: [0.76956522 0.84782609 0.73913043 0.7173913 0.6826087 0.71304348
0.61038961 0.91341991 0.86580087 0.81385281]
mean value: 0.7673028420854507
key: test_fscore
value: [0.76470588 0.66666667 0.64516129 0.47058824 0.83870968 0.81818182
0.15384615 0.57142857 0.66666667 0.66666667]
mean value: 0.6262621628845538
key: train_fscore
value: [0.8113879 0.85943775 0.79166667 0.60606061 0.75907591 0.59756098
0.36619718 0.91525424 0.85024155 0.8401487 ]
mean value: 0.7397031472452881
key: test_precision
value: [0.61904762 0.64285714 0.55555556 1. 0.72222222 1.
1. 0.66666667 0.72727273 0.58823529]
mean value: 0.7521857227739581
key: train_precision
value: [0.68674699 0.79850746 0.65895954 1. 0.61170213 1.
1. 0.9 0.95652174 0.73376623]
mean value: 0.8346204088766872
key: test_recall
value: [1. 0.69230769 0.76923077 0.30769231 1. 0.69230769
0.08333333 0.5 0.61538462 0.76923077]
mean value: 0.642948717948718
key: train_recall
value: [0.99130435 0.93043478 0.99130435 0.43478261 1. 0.42608696
0.22413793 0.93103448 0.76521739 0.9826087 ]
mean value: 0.7676911544227886
key: test_roc_auc
value: [0.69230769 0.65384615 0.57692308 0.65384615 0.80769231 0.84615385
0.54166667 0.63461538 0.68269231 0.59294872]
mean value: 0.6682692307692307
key: train_roc_auc
value: [0.76956522 0.84782609 0.73913043 0.7173913 0.6826087 0.71304348
0.61206897 0.91334333 0.86536732 0.81458021]
mean value: 0.7674925037481259
key: test_jcc
value: [0.61904762 0.5 0.47619048 0.30769231 0.72222222 0.69230769
0.08333333 0.4 0.5 0.5 ]
mean value: 0.4800793650793651
key: train_jcc
value: [0.68263473 0.75352113 0.65517241 0.43478261 0.61170213 0.42608696
0.22413793 0.84375 0.7394958 0.72435897]
mean value: 0.6095642667682339
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01598191 0.01755881 0.01522183 0.01561093 0.01599932 0.01559615
0.01611876 0.01620126 0.01491332 0.01539016]
mean value: 0.015859246253967285
key: score_time
value: [0.01147318 0.01146603 0.01138854 0.01152349 0.01148939 0.0114677
0.01142216 0.01143479 0.01141429 0.01151896]
mean value: 0.011459851264953613
key: test_mcc
value: [0.08084521 0.24253563 0.54494926 0.48795004 0.79056942 0.33333333
0.31581015 0.1141228 0.19871795 0.52904327]
mean value: 0.36378770402293453
key: train_mcc
value: [0.61879835 0.7659626 0.71524747 0.51355259 0.60715853 0.71269665
0.74152227 0.65132718 0.73818656 0.73373869]
mean value: 0.6798190892220928
key: test_accuracy
value: [0.53846154 0.61538462 0.76923077 0.69230769 0.88461538 0.65384615
0.64 0.56 0.6 0.76 ]
mean value: 0.6713846153846154
key: train_accuracy
value: [0.78695652 0.87826087 0.85217391 0.70869565 0.7826087 0.84782609
0.86580087 0.8008658 0.85714286 0.86580087]
mean value: 0.8246132128740824
key: test_fscore
value: [0.45454545 0.54545455 0.75 0.76470588 0.86956522 0.70967742
0.68965517 0.42105263 0.61538462 0.75 ]
mean value: 0.657004093847644
key: train_fscore
value: [0.73796791 0.86792453 0.864 0.77441077 0.73404255 0.8627451
0.87649402 0.75531915 0.87258687 0.85972851]
mean value: 0.8205219420596624
key: test_precision
value: [0.55555556 0.66666667 0.81818182 0.61904762 1. 0.61111111
0.58823529 0.57142857 0.61538462 0.81818182]
mean value: 0.6863793069675422
key: train_precision
value: [0.95833333 0.94845361 0.8 0.63186813 0.94520548 0.78571429
0.81481481 0.98611111 0.78472222 0.89622642]
mean value: 0.8551449401857716
key: test_recall
value: [0.38461538 0.46153846 0.69230769 1. 0.76923077 0.84615385
0.83333333 0.33333333 0.61538462 0.69230769]
mean value: 0.6628205128205128
key: train_recall
value: [0.6 0.8 0.93913043 1. 0.6 0.95652174
0.94827586 0.61206897 0.9826087 0.82608696]
mean value: 0.8264692653673164
key: test_roc_auc
value: [0.53846154 0.61538462 0.76923077 0.69230769 0.88461538 0.65384615
0.6474359 0.55128205 0.59935897 0.76282051]
mean value: 0.671474358974359
key: train_roc_auc
value: [0.78695652 0.87826087 0.85217391 0.70869565 0.7826087 0.84782609
0.86544228 0.80168666 0.85768366 0.86562969]
mean value: 0.8246964017991005
key: test_jcc
value: [0.29411765 0.375 0.6 0.61904762 0.76923077 0.55
0.52631579 0.26666667 0.44444444 0.6 ]
mean value: 0.5044822935922008
key: train_jcc
value: [0.58474576 0.76666667 0.76056338 0.63186813 0.57983193 0.75862069
0.78014184 0.60683761 0.7739726 0.75396825]
mean value: 0.6997216871473853
MCC on Blind test: 0.42
Accuracy on Blind test: 0.71
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.13755894 0.11953592 0.11992717 0.12073612 0.12117648 0.11990905
0.12110257 0.12053275 0.11999488 0.11980319]
mean value: 0.12202770709991455
key: score_time
value: [0.01471901 0.01477623 0.01485658 0.01490092 0.01529408 0.01525807
0.01554537 0.01495647 0.0157783 0.01499677]
mean value: 0.015108180046081544
key: test_mcc
value: [0.54494926 0.23076923 0.56591646 0.47434165 0.77151675 0.6172134
0.44702443 0.28022427 0.36774959 0.67948718]
mean value: 0.4979192215844021
key: train_mcc
value: [1. 1. 0.99134183 1. 1. 1.
1. 0.99137867 0.99137931 1. ]
mean value: 0.9974099805575383
key: test_accuracy
value: [0.76923077 0.61538462 0.76923077 0.73076923 0.88461538 0.80769231
0.72 0.64 0.68 0.84 ]
mean value: 0.7456923076923077
key: train_accuracy
value: [1. 1. 0.99565217 1. 1. 1.
1. 0.995671 0.995671 1. ]
mean value: 0.9986994165255035
key: test_fscore
value: [0.78571429 0.61538462 0.72727273 0.75862069 0.88888889 0.81481481
0.66666667 0.57142857 0.66666667 0.84615385]
mean value: 0.7341611772646256
key: train_fscore
value: [1. 1. 0.99563319 1. 1. 1.
1. 0.99570815 0.995671 1. ]
mean value: 0.998701233795036
key: test_precision
value: [0.73333333 0.61538462 0.88888889 0.6875 0.85714286 0.78571429
0.77777778 0.66666667 0.72727273 0.84615385]
mean value: 0.7585834998334998
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.99145299 0.99137931 1. ]
mean value: 0.9982832301797819
key: test_recall
value: [0.84615385 0.61538462 0.61538462 0.84615385 0.92307692 0.84615385
0.58333333 0.5 0.61538462 0.84615385]
mean value: 0.7237179487179487
key: train_recall
value: [1. 1. 0.99130435 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9991304347826087
key: test_roc_auc
value: [0.76923077 0.61538462 0.76923077 0.73076923 0.88461538 0.80769231
0.71474359 0.63461538 0.68269231 0.83974359]
mean value: 0.7448717948717949
key: train_roc_auc
value: [1. 1. 0.99565217 1. 1. 1.
1. 0.99565217 0.99568966 1. ]
mean value: 0.99869940029985
key: test_jcc
value: [0.64705882 0.44444444 0.57142857 0.61111111 0.8 0.6875
0.5 0.4 0.5 0.73333333]
mean value: 0.5894876283846873
key: train_jcc
value: [1. 1. 0.99130435 1. 1. 1.
1. 0.99145299 0.99137931 1. ]
mean value: 0.9974136649623906
MCC on Blind test: 0.45
Accuracy on Blind test: 0.73
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05035281 0.05936909 0.05014372 0.0604763 0.056777 0.06902242
0.04624724 0.04836965 0.04742455 0.06249499]
mean value: 0.05506777763366699
key: score_time
value: [0.0195353 0.02203798 0.02992177 0.02294207 0.03088975 0.01927781
0.02395821 0.02438879 0.02904439 0.02309632]
mean value: 0.024509239196777343
key: test_mcc
value: [0.38924947 0.63245553 0.5 0.69230769 0.70064905 0.54494926
0.19611614 0.51923077 0.52904327 0.51923077]
mean value: 0.5223231949125111
key: train_mcc
value: [0.94839996 0.92261158 0.94911877 0.9742446 0.94001934 0.9658018
0.99137867 0.95674663 0.95674339 0.92237897]
mean value: 0.95274437058133
key: test_accuracy
value: [0.69230769 0.80769231 0.73076923 0.84615385 0.84615385 0.76923077
0.6 0.76 0.76 0.76 ]
mean value: 0.7572307692307693
key: train_accuracy
value: [0.97391304 0.96086957 0.97391304 0.98695652 0.96956522 0.9826087
0.995671 0.97835498 0.97835498 0.96103896]
mean value: 0.9761246000376436
key: test_fscore
value: [0.66666667 0.82758621 0.66666667 0.84615385 0.83333333 0.75
0.54545455 0.75 0.75 0.76923077]
mean value: 0.740509203440238
key: train_fscore
value: [0.97345133 0.96 0.97321429 0.98678414 0.96888889 0.98230088
0.99570815 0.97835498 0.97816594 0.96035242]
mean value: 0.9757221022595252
key: test_precision
value: [0.72727273 0.75 0.875 0.84615385 0.90909091 0.81818182
0.6 0.75 0.81818182 0.76923077]
mean value: 0.7863111888111888
key: train_precision
value: [0.99099099 0.98181818 1. 1. 0.99090909 1.
0.99145299 0.9826087 0.98245614 0.97321429]
mean value: 0.9893450376888592
key: test_recall
value: [0.61538462 0.92307692 0.53846154 0.84615385 0.76923077 0.69230769
0.5 0.75 0.69230769 0.76923077]
mean value: 0.7096153846153846
key: train_recall
value: [0.95652174 0.93913043 0.94782609 0.97391304 0.94782609 0.96521739
1. 0.97413793 0.97391304 0.94782609]
mean value: 0.9626311844077962
key: test_roc_auc
value: [0.69230769 0.80769231 0.73076923 0.84615385 0.84615385 0.76923077
0.59615385 0.75961538 0.76282051 0.75961538]
mean value: 0.757051282051282
key: train_roc_auc
value: [0.97391304 0.96086957 0.97391304 0.98695652 0.96956522 0.9826087
0.99565217 0.97837331 0.97833583 0.96098201]
mean value: 0.9761169415292353
key: test_jcc
value: [0.5 0.70588235 0.5 0.73333333 0.71428571 0.6
0.375 0.6 0.6 0.625 ]
mean value: 0.5953501400560224
key: train_jcc
value: [0.94827586 0.92307692 0.94782609 0.97391304 0.93965517 0.96521739
0.99145299 0.95762712 0.95726496 0.92372881]
mean value: 0.9528038360220151
MCC on Blind test: 0.44
Accuracy on Blind test: 0.71
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.04538059 0.05333805 0.03113556 0.0357511 0.07322001 0.05139589
0.03071594 0.03451061 0.10031009 0.06317687]
mean value: 0.05189347267150879
key: score_time
value: [0.02160525 0.01283216 0.01277208 0.02457476 0.02156925 0.01298785
0.01267076 0.02030373 0.02255225 0.01272869]
mean value: 0.017459678649902343
key: test_mcc
value: [ 0. 0.46291005 0.38461538 0.38461538 0.46291005 0.3086067
0.36774959 -0.05337605 0.03846154 0.11613145]
mean value: 0.2472624094332408
key: train_mcc
value: [0.99134183 0.99134183 0.99134183 1. 0.99134183 0.99134183
0.99137867 0.99137867 0.98268366 0.99137931]
mean value: 0.9913529444107553
key: test_accuracy
value: [0.5 0.73076923 0.69230769 0.69230769 0.73076923 0.65384615
0.68 0.48 0.52 0.56 ]
mean value: 0.624
key: train_accuracy
value: [0.99565217 0.99565217 0.99565217 1. 0.99565217 0.99565217
0.995671 0.995671 0.99134199 0.995671 ]
mean value: 0.9956615847920196
key: test_fscore
value: [0.55172414 0.72 0.69230769 0.69230769 0.74074074 0.66666667
0.69230769 0.38095238 0.53846154 0.59259259]
mean value: 0.626806113426803
key: train_fscore
value: [0.995671 0.995671 0.995671 1. 0.995671 0.995671
0.99570815 0.99570815 0.99130435 0.995671 ]
mean value: 0.9956746630864937
key: test_precision
value: [0.5 0.75 0.69230769 0.69230769 0.71428571 0.64285714
0.64285714 0.44444444 0.53846154 0.57142857]
mean value: 0.6188949938949939
key: train_precision
value: [0.99137931 0.99137931 0.99137931 1. 0.99137931 0.99137931
0.99145299 0.99145299 0.99130435 0.99137931]
mean value: 0.9922486192801035
key: test_recall
value: [0.61538462 0.69230769 0.69230769 0.69230769 0.76923077 0.69230769
0.75 0.33333333 0.53846154 0.61538462]
mean value: 0.639102564102564
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99130435 1. ]
mean value: 0.9991304347826087
key: test_roc_auc
value: [0.5 0.73076923 0.69230769 0.69230769 0.73076923 0.65384615
0.68269231 0.47435897 0.51923077 0.55769231]
mean value: 0.6233974358974359
key: train_roc_auc
value: [0.99565217 0.99565217 0.99565217 1. 0.99565217 0.99565217
0.99565217 0.99565217 0.99134183 0.99568966]
mean value: 0.9956596701649175
key: test_jcc
value: [0.38095238 0.5625 0.52941176 0.52941176 0.58823529 0.5
0.52941176 0.23529412 0.36842105 0.42105263]
mean value: 0.464469077104526
key: train_jcc
value: [0.99137931 0.99137931 0.99137931 1. 0.99137931 0.99137931
0.99145299 0.99145299 0.98275862 0.99137931]
mean value: 0.9913940465664604
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.44359875 0.41606069 0.41640043 0.4198699 0.41944647 0.4192605
0.42750263 0.41986775 0.41747665 0.42336249]
mean value: 0.4222846269607544
key: score_time
value: [0.00940156 0.00916147 0.00931311 0.0093689 0.01004791 0.00933456
0.00939679 0.0092485 0.00944662 0.00916076]
mean value: 0.009388017654418945
key: test_mcc
value: [0.63245553 0.69230769 0.5 0.69230769 0.77151675 0.69230769
0.67948718 0.28022427 0.52904327 0.76282051]
mean value: 0.6232470588678629
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.80769231 0.84615385 0.73076923 0.84615385 0.88461538 0.84615385
0.84 0.64 0.76 0.88 ]
mean value: 0.8081538461538461
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82758621 0.84615385 0.66666667 0.84615385 0.88888889 0.84615385
0.83333333 0.57142857 0.75 0.88 ]
mean value: 0.7956365205675551
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.84615385 0.875 0.84615385 0.85714286 0.84615385
0.83333333 0.66666667 0.81818182 0.91666667]
mean value: 0.825545288045288
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.84615385 0.53846154 0.84615385 0.92307692 0.84615385
0.83333333 0.5 0.69230769 0.84615385]
mean value: 0.7794871794871795
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.80769231 0.84615385 0.73076923 0.84615385 0.88461538 0.84615385
0.83974359 0.63461538 0.76282051 0.88141026]
mean value: 0.8080128205128205
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.70588235 0.73333333 0.5 0.73333333 0.8 0.73333333
0.71428571 0.4 0.6 0.78571429]
mean value: 0.6705882352941176
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02251291 0.02159595 0.02179456 0.02176547 0.02233934 0.02202964
0.02235866 0.02211452 0.02222347 0.02275968]
mean value: 0.022149419784545897
key: score_time
value: [0.01358199 0.01356983 0.01375198 0.01353145 0.01821208 0.01508594
0.01521039 0.01506615 0.01855421 0.01217628]
mean value: 0.014874029159545898
key: test_mcc
value: [ 0.47434165 0.31622777 0.38461538 0.07784989 0.54494926 0.07784989
0.04516223 -0.12179487 0.03268602 -0.20645591]
mean value: 0.16254313207270998
key: train_mcc
value: [0.98275733 0.98275733 0.99134183 0.99134183 0.9742446 0.9658018
0.84693252 0.99137867 0.94113789 1. ]
mean value: 0.9667693784024392
key: test_accuracy
value: [0.73076923 0.65384615 0.69230769 0.53846154 0.76923077 0.53846154
0.52 0.44 0.52 0.4 ]
mean value: 0.5803076923076923
key: train_accuracy
value: [0.99130435 0.99130435 0.99565217 0.99565217 0.98695652 0.9826087
0.91774892 0.995671 0.96969697 1. ]
mean value: 0.9826595143986449
key: test_fscore
value: [0.75862069 0.68965517 0.69230769 0.57142857 0.75 0.57142857
0.53846154 0.41666667 0.57142857 0.44444444]
mean value: 0.6004441918235022
key: train_fscore
value: [0.99137931 0.99137931 0.995671 0.995671 0.98712446 0.98290598
0.92430279 0.99570815 0.97046414 1. ]
mean value: 0.9834606136829099
key: test_precision
value: [0.6875 0.625 0.69230769 0.53333333 0.81818182 0.53333333
0.5 0.41666667 0.53333333 0.42857143]
mean value: 0.5768227605727606
key: train_precision
value: [0.98290598 0.98290598 0.99137931 0.99137931 0.97457627 0.96638655
0.85925926 0.99145299 0.94262295 1. ]
mean value: 0.9682868613841833
key: test_recall
value: [0.84615385 0.76923077 0.69230769 0.61538462 0.69230769 0.61538462
0.58333333 0.41666667 0.61538462 0.46153846]
mean value: 0.6307692307692307
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73076923 0.65384615 0.69230769 0.53846154 0.76923077 0.53846154
0.5224359 0.43910256 0.51602564 0.3974359 ]
mean value: 0.5798076923076924
key: train_roc_auc
value: [0.99130435 0.99130435 0.99565217 0.99565217 0.98695652 0.9826087
0.9173913 0.99565217 0.96982759 1. ]
mean value: 0.9826349325337331
key: test_jcc
value: [0.61111111 0.52631579 0.52941176 0.4 0.6 0.4
0.36842105 0.26315789 0.4 0.28571429]
mean value: 0.43841318983733846
key: train_jcc
value: [0.98290598 0.98290598 0.99137931 0.99137931 0.97457627 0.96638655
0.85925926 0.99145299 0.94262295 1. ]
mean value: 0.9682868613841833
MCC on Blind test: 0.06
Accuracy on Blind test: 0.54
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02251625 0.03479052 0.01392817 0.01403356 0.01398563 0.01390481
0.03432369 0.0349257 0.03514314 0.03442669]
mean value: 0.025197815895080567
key: score_time
value: [0.02350426 0.01194096 0.01172352 0.01173067 0.01177621 0.01163101
0.02290154 0.02014542 0.02137446 0.02326202]
mean value: 0.016999006271362305
key: test_mcc
value: [0.5 0.40422604 0.54494926 0.63245553 0.69230769 0.46291005
0.36774959 0.19611614 0.12179487 0.35897436]
mean value: 0.4281483531816259
key: train_mcc
value: [0.76524632 0.80003025 0.77403011 0.76663895 0.79130435 0.78263829
0.80089955 0.89822939 0.78356699 0.78358321]
mean value: 0.7946167390990425
key: test_accuracy
value: [0.73076923 0.69230769 0.76923077 0.80769231 0.84615385 0.73076923
0.68 0.6 0.56 0.68 ]
mean value: 0.7096923076923077
key: train_accuracy
value: [0.8826087 0.9 0.88695652 0.8826087 0.89565217 0.89130435
0.9004329 0.94805195 0.89177489 0.89177489]
mean value: 0.897116506681724
key: test_fscore
value: [0.77419355 0.63636364 0.75 0.82758621 0.84615385 0.74074074
0.69230769 0.54545455 0.56 0.69230769]
mean value: 0.7065107908611802
key: train_fscore
value: [0.88311688 0.9004329 0.88793103 0.87892377 0.89565217 0.89177489
0.9004329 0.95 0.89082969 0.89177489]
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
mean value: 0.8970869137067558
key: test_precision
value: [0.66666667 0.77777778 0.81818182 0.75 0.84615385 0.71428571
0.64285714 0.6 0.58333333 0.69230769]
mean value: 0.7091563991563992
key: train_precision
value: [0.87931034 0.89655172 0.88034188 0.90740741 0.89565217 0.88793103
0.90434783 0.91935484 0.89473684 0.88793103]
mean value: 0.8953565106495263
key: test_recall
value: [0.92307692 0.53846154 0.69230769 0.92307692 0.84615385 0.76923077
0.75 0.5 0.53846154 0.69230769]
mean value: 0.7173076923076923
key: train_recall
value: [0.88695652 0.90434783 0.89565217 0.85217391 0.89565217 0.89565217
0.89655172 0.98275862 0.88695652 0.89565217]
mean value: 0.8992353823088456
key: test_roc_auc
value: [0.73076923 0.69230769 0.76923077 0.80769231 0.84615385 0.73076923
0.68269231 0.59615385 0.56089744 0.67948718]
mean value: 0.7096153846153846
key: train_roc_auc
value: [0.8826087 0.9 0.88695652 0.8826087 0.89565217 0.89130435
0.90044978 0.94790105 0.89175412 0.8917916 ]
mean value: 0.8971026986506747
key: test_jcc
value: [0.63157895 0.46666667 0.6 0.70588235 0.73333333 0.58823529
0.52941176 0.375 0.38888889 0.52941176]
mean value: 0.5548409012727898
key: train_jcc
value: [0.79069767 0.81889764 0.79844961 0.784 0.81102362 0.8046875
0.81889764 0.9047619 0.80314961 0.8046875 ]
mean value: 0.8139252695520618
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.31885719 0.26284933 0.25009298 0.23583317 0.23779225 0.23270583
0.23397398 0.24211431 0.30185318 0.2662189 ]
mean value: 0.25822911262512205
key: score_time
value: [0.02399087 0.02392244 0.02324104 0.02295756 0.02134013 0.02078986
0.02291155 0.02077198 0.02287841 0.02306271]
mean value: 0.022586655616760255
key: test_mcc
value: [0.5 0.23354968 0.54494926 0.63245553 0.84615385 0.56591646
0.20645591 0.1990977 0.12179487 0.35897436]
mean value: 0.4209347621861535
key: train_mcc
value: [0.76524632 0.64350259 0.77403011 0.76663895 0.64369733 0.61776511
0.65447938 0.71429643 0.78356699 0.78358321]
mean value: 0.7146806406714871
key: test_accuracy
value: [0.73076923 0.61538462 0.76923077 0.80769231 0.92307692 0.76923077
0.6 0.6 0.56 0.68 ]
mean value: 0.7055384615384616
key: train_accuracy
value: [0.8826087 0.82173913 0.88695652 0.8826087 0.82173913 0.80869565
0.82683983 0.85714286 0.89177489 0.89177489]
mean value: 0.8571880293619424
key: test_fscore
value: [0.77419355 0.58333333 0.75 0.82758621 0.92307692 0.8
0.61538462 0.5 0.56 0.69230769]
mean value: 0.7025882319386213
key: train_fscore
value: [0.88311688 0.82251082 0.88793103 0.87892377 0.82403433 0.81196581
0.83193277 0.8583691 0.89082969 0.89177489]
mean value: 0.8581389111576094
key: test_precision
value: [0.66666667 0.63636364 0.81818182 0.75 0.92307692 0.70588235
0.57142857 0.625 0.58333333 0.69230769]
mean value: 0.6972240994299818
key: train_precision
value: [0.87931034 0.81896552 0.88034188 0.90740741 0.81355932 0.79831933
0.81147541 0.85470085 0.89473684 0.88793103]
mean value: 0.8546747940708186
key: test_recall
value: [0.92307692 0.53846154 0.69230769 0.92307692 0.92307692 0.92307692
0.66666667 0.41666667 0.53846154 0.69230769]
mean value: 0.7237179487179487
key: train_recall
value: [0.88695652 0.82608696 0.89565217 0.85217391 0.83478261 0.82608696
0.85344828 0.86206897 0.88695652 0.89565217]
mean value: 0.8619865067466267
key: test_roc_auc
value: [0.73076923 0.61538462 0.76923077 0.80769231 0.92307692 0.76923077
0.6025641 0.59294872 0.56089744 0.67948718]
mean value: 0.7051282051282052
key: train_roc_auc
value: [0.8826087 0.82173913 0.88695652 0.8826087 0.82173913 0.80869565
0.82672414 0.85712144 0.89175412 0.8917916 ]
mean value: 0.8571739130434783
key: test_jcc
value: [0.63157895 0.41176471 0.6 0.70588235 0.85714286 0.66666667
0.44444444 0.33333333 0.38888889 0.52941176]
mean value: 0.5569113961374024
key: train_jcc
value: [0.79069767 0.69852941 0.79844961 0.784 0.70072993 0.68345324
0.71223022 0.7518797 0.80314961 0.8046875 ]
mean value: 0.7527806884378454
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03337646 0.03352547 0.03370428 0.03442836 0.03449273 0.03441572
0.03380013 0.04961681 0.03415537 0.0339179 ]
mean value: 0.03554332256317139
key: score_time
value: [0.01657343 0.01375055 0.01404023 0.01396298 0.01412797 0.01399684
0.01417994 0.01409483 0.01432037 0.01417518]
mean value: 0.014322233200073243
key: test_mcc
value: [0.62994079 0.57265629 0.49612132 0.61925228 0.48954403 0.53006813
0.29960206 0.67916667 0.55 0.58316015]
mean value: 0.5449511717775465
key: train_mcc
value: [0.66430266 0.7071609 0.72953394 0.69442482 0.72959417 0.72260223
0.70820669 0.70820669 0.71529889 0.72256008]
mean value: 0.710189106315
key: test_accuracy
value: [0.8125 0.78125 0.74193548 0.80645161 0.74193548 0.74193548
0.64516129 0.83870968 0.77419355 0.77419355]
mean value: 0.7658266129032258
key: train_accuracy
value: [0.83214286 0.85357143 0.86476868 0.84697509 0.86476868 0.86120996
0.85409253 0.85409253 0.85765125 0.86120996]
mean value: 0.8550482968988307
key: test_fscore
value: [0.8 0.8 0.69230769 0.8125 0.75 0.77777778
0.62068966 0.83870968 0.77419355 0.74074074]
mean value: 0.7606919091805077
key: train_fscore
value: [0.83274021 0.85304659 0.86524823 0.84476534 0.86619718 0.86021505
0.85409253 0.85409253 0.85714286 0.85920578]
mean value: 0.8546746301974811
key: test_precision
value: [0.85714286 0.73684211 0.81818182 0.76470588 0.70588235 0.66666667
0.69230769 0.86666667 0.8 0.90909091]
mean value: 0.7817486950613886
key: train_precision
value: [0.82978723 0.85611511 0.86524823 0.86029412 0.86013986 0.86956522
0.85106383 0.85106383 0.85714286 0.86861314]
mean value: 0.8569033419488257
key: test_recall
value: [0.75 0.875 0.6 0.86666667 0.8 0.93333333
0.5625 0.8125 0.75 0.625 ]
mean value: 0.7575000000000001
key: train_recall
value: [0.83571429 0.85 0.86524823 0.82978723 0.87234043 0.85106383
0.85714286 0.85714286 0.85714286 0.85 ]
mean value: 0.8525582573454914
key: test_roc_auc
value: [0.8125 0.78125 0.7375 0.80833333 0.74375 0.74791667
0.64791667 0.83958333 0.775 0.77916667]
mean value: 0.7672916666666667
key: train_roc_auc
value: [0.83214286 0.85357143 0.86476697 0.84703647 0.86474164 0.8612462
0.85410334 0.85410334 0.85764944 0.86117021]
mean value: 0.8550531914893618
key: test_jcc
value: [0.66666667 0.66666667 0.52941176 0.68421053 0.6 0.63636364
0.45 0.72222222 0.63157895 0.58823529]
mean value: 0.6175355724426932
key: train_jcc
value: [0.71341463 0.74375 0.7625 0.73125 0.76397516 0.75471698
0.74534161 0.74534161 0.75 0.75316456]
mean value: 0.746345455733361
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.7325387 0.89071441 0.77209663 0.88231349 0.76088715 0.76609707
0.87763047 0.75076795 0.75155282 0.94855165]
mean value: 0.8133150339126587
key: score_time
value: [0.01201606 0.01187754 0.01192522 0.01188445 0.01186037 0.01193094
0.01192117 0.01198769 0.01191378 0.01193261]
mean value: 0.011924982070922852
key: test_mcc
value: [0.5 0.57265629 0.42321607 0.55 0.50443936 0.4770843
0.29960206 0.49612132 0.61925228 0.58316015]
mean value: 0.502553182374343
key: train_mcc
value: [0.62882815 0.62914948 0.70131788 0.63736394 0.5960184 0.65835866
0.63717891 0.60897119 0.60936188 0.71538579]
mean value: 0.6421934266319843
key: test_accuracy
value: [0.75 0.78125 0.70967742 0.77419355 0.74193548 0.70967742
0.64516129 0.74193548 0.80645161 0.77419355]
mean value: 0.7434475806451613
key: train_accuracy
value: [0.81428571 0.81428571 0.85053381 0.81850534 0.79715302 0.82918149
0.81850534 0.80427046 0.80427046 0.85765125]
mean value: 0.8208642602948653
key: test_fscore
value: [0.75 0.8 0.66666667 0.77419355 0.76470588 0.75675676
0.62068966 0.77777778 0.8 0.74074074]
mean value: 0.7451531027854393
key: train_fscore
value: [0.81690141 0.81818182 0.85314685 0.82229965 0.80546075 0.82978723
0.81978799 0.80701754 0.80836237 0.85815603]
mean value: 0.8239101643675262
key: test_precision
value: [0.75 0.73684211 0.75 0.75 0.68421053 0.63636364
0.69230769 0.7 0.85714286 0.90909091]
mean value: 0.7465957726484043
key: train_precision
value: [0.80555556 0.80136986 0.84137931 0.80821918 0.77631579 0.82978723
0.81118881 0.79310345 0.78911565 0.85211268]
mean value: 0.8108147512292025
key: test_recall
value: [0.75 0.875 0.6 0.8 0.86666667 0.93333333
0.5625 0.875 0.75 0.625 ]
mean value: 0.76375
key: train_recall
value: [0.82857143 0.83571429 0.86524823 0.83687943 0.83687943 0.82978723
0.82857143 0.82142857 0.82857143 0.86428571]
mean value: 0.8375937183383992
key: test_roc_auc
value: [0.75 0.78125 0.70625 0.775 0.74583333 0.71666667
0.64791667 0.7375 0.80833333 0.77916667]
mean value: 0.7447916666666667
key: train_roc_auc
value: [0.81428571 0.81428571 0.85048126 0.81843972 0.79701114 0.82917933
0.81854103 0.80433131 0.80435664 0.85767477]
mean value: 0.8208586626139818
key: test_jcc
value: [0.6 0.66666667 0.5 0.63157895 0.61904762 0.60869565
0.45 0.63636364 0.66666667 0.58823529]
mean value: 0.596725448240457
key: train_jcc
value: [0.69047619 0.69230769 0.74390244 0.69822485 0.67428571 0.70909091
0.69461078 0.67647059 0.67836257 0.7515528 ]
mean value: 0.7009284532064781
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01314902 0.01112795 0.00962234 0.00998735 0.00902677 0.00961423
0.00921035 0.00920057 0.00901651 0.00913954]
mean value: 0.009909462928771973
key: score_time
value: [0.01189709 0.00921988 0.00953174 0.0086832 0.00859284 0.00856686
0.00860071 0.00853753 0.00849509 0.00902152]
mean value: 0.009114646911621093
key: test_mcc
value: [0.25819889 0.53935989 0.23939495 0.74896053 0.05046084 0.23939495
0.28870546 0.29069387 0.48333333 0.74896053]
mean value: 0.38874632340433807
key: train_mcc
value: [0.49726525 0.47977484 0.50204455 0.50192607 0.51324925 0.51683781
0.47539896 0.51888435 0.45793458 0.49211483]
mean value: 0.49554304714846314
key: test_accuracy
value: [0.625 0.75 0.61290323 0.87096774 0.51612903 0.61290323
0.64516129 0.64516129 0.74193548 0.87096774]
mean value: 0.6891129032258064
key: train_accuracy
value: [0.74285714 0.73571429 0.74377224 0.75088968 0.75088968 0.7544484
0.73309609 0.7544484 0.70818505 0.7366548 ]
mean value: 0.7410955770208439
key: test_fscore
value: [0.66666667 0.78947368 0.64705882 0.875 0.59459459 0.64705882
0.66666667 0.68571429 0.75 0.86666667]
mean value: 0.718890021157823
key: train_fscore
value: [0.76774194 0.75816993 0.7721519 0.75524476 0.77564103 0.7752443
0.75570033 0.7752443 0.75739645 0.7672956 ]
mean value: 0.7659830522014204
key: test_precision
value: [0.6 0.68181818 0.57894737 0.82352941 0.5 0.57894737
0.64705882 0.63157895 0.75 0.92857143]
mean value: 0.6720451529894255
key: train_precision
value: [0.7 0.69879518 0.69714286 0.74482759 0.70760234 0.71686747
0.69461078 0.71257485 0.64646465 0.68539326]
mean value: 0.7004278966767578
key: test_recall
value: [0.75 0.9375 0.73333333 0.93333333 0.73333333 0.73333333
0.6875 0.75 0.75 0.8125 ]
mean value: 0.7820833333333334
key: train_recall
value: [0.85 0.82857143 0.86524823 0.76595745 0.85815603 0.84397163
0.82857143 0.85 0.91428571 0.87142857]
mean value: 0.8476190476190476
key: test_roc_auc
value: [0.625 0.75 0.61666667 0.87291667 0.52291667 0.61666667
0.64375 0.64166667 0.74166667 0.87291667]
mean value: 0.6904166666666667
key: train_roc_auc
value: [0.74285714 0.73571429 0.7433384 0.75083587 0.75050659 0.75412867
0.73343465 0.75478723 0.70891591 0.73713273]
mean value: 0.7411651469098278
key: test_jcc
value: [0.5 0.65217391 0.47826087 0.77777778 0.42307692 0.47826087
0.5 0.52173913 0.6 0.76470588]
mean value: 0.5695995365816338
key: train_jcc
value: [0.62303665 0.61052632 0.62886598 0.60674157 0.63350785 0.63297872
0.60732984 0.63297872 0.60952381 0.62244898]
mean value: 0.6207938449678521
MCC on Blind test: 0.28
Accuracy on Blind test: 0.65
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00933599 0.00952888 0.00928903 0.00916195 0.00920892 0.00923371
0.00950384 0.00935936 0.0097599 0.00947571]
mean value: 0.00938572883605957
key: score_time
value: [0.00861382 0.00869727 0.00851321 0.00864816 0.00889111 0.00874734
0.00866508 0.00860071 0.00856805 0.00867391]
mean value: 0.008661866188049316
key: test_mcc
value: [0.56360186 0.48653363 0.15899721 0.5612264 0.4770843 0.37191715
0.28870546 0.42321607 0.09583333 0.58316015]
mean value: 0.40102755595919914
key: train_mcc
value: [0.57430732 0.53803891 0.55459753 0.49900055 0.55953199 0.56745262
0.55493536 0.58009119 0.54153517 0.53306083]
mean value: 0.5502551476643553
key: test_accuracy
value: [0.78125 0.71875 0.58064516 0.77419355 0.70967742 0.67741935
0.64516129 0.70967742 0.5483871 0.77419355]
mean value: 0.6919354838709677
key: train_accuracy
value: [0.78571429 0.76785714 0.77580071 0.74733096 0.77935943 0.78291815
0.77580071 0.79003559 0.76868327 0.76512456]
mean value: 0.7738624809354346
key: test_fscore
value: [0.77419355 0.76923077 0.55172414 0.78787879 0.75675676 0.70588235
0.66666667 0.74285714 0.5625 0.74074074]
mean value: 0.7058430903390172
key: train_fscore
value: [0.79591837 0.778157 0.78787879 0.7641196 0.7862069 0.79180887
0.78644068 0.79003559 0.78114478 0.7755102 ]
mean value: 0.7837220773794649
key: test_precision
value: [0.8 0.65217391 0.57142857 0.72222222 0.63636364 0.63157895
0.64705882 0.68421053 0.5625 0.90909091]
mean value: 0.681662754936244
key: train_precision
value: [0.75974026 0.74509804 0.75 0.71875 0.76510067 0.76315789
0.7483871 0.78723404 0.7388535 0.74025974]
mean value: 0.7516581247605566
key: test_recall
value: [0.75 0.9375 0.53333333 0.86666667 0.93333333 0.8
0.6875 0.8125 0.5625 0.625 ]
mean value: 0.7508333333333334
key: train_recall
value: [0.83571429 0.81428571 0.82978723 0.81560284 0.80851064 0.82269504
0.82857143 0.79285714 0.82857143 0.81428571]
mean value: 0.8190881458966566
key: test_roc_auc
value: [0.78125 0.71875 0.57916667 0.77708333 0.71666667 0.68125
0.64375 0.70625 0.54791667 0.77916667]
mean value: 0.693125
key: train_roc_auc
value: [0.78571429 0.76785714 0.7756079 0.74708713 0.77925532 0.78277609
0.77598784 0.79004559 0.76889564 0.76529889]
mean value: 0.7738525835866261
key: test_jcc
value: [0.63157895 0.625 0.38095238 0.65 0.60869565 0.54545455
0.5 0.59090909 0.39130435 0.58823529]
mean value: 0.5512130258802086
key: train_jcc
value: [0.66101695 0.63687151 0.65 0.61827957 0.64772727 0.65536723
0.64804469 0.65294118 0.64088398 0.63333333]
mean value: 0.6444465712232499
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00932002 0.01014709 0.00882673 0.00991416 0.00968051 0.00960183
0.00987482 0.00993252 0.00893021 0.00874734]
mean value: 0.009497523307800293
key: score_time
value: [0.01476312 0.01438737 0.01490664 0.0168004 0.01892352 0.01665711
0.01663828 0.011374 0.01400232 0.01244378]
mean value: 0.015089654922485351
key: test_mcc
value: [ 0.37796447 0.12598816 0.02928896 0.28870546 -0.02227177 0.29960206
0.225 0.15899721 0.44824996 0.19266866]
mean value: 0.21241931692556495
key: train_mcc
value: [0.50005103 0.56441531 0.58765691 0.57302802 0.60162197 0.5956161
0.61886765 0.6014592 0.53738602 0.53738602]
mean value: 0.5717488216987027
key: test_accuracy
value: [0.6875 0.5625 0.51612903 0.64516129 0.48387097 0.64516129
0.61290323 0.58064516 0.70967742 0.58064516]
mean value: 0.6024193548387097
key: train_accuracy
value: [0.75 0.78214286 0.79359431 0.78647687 0.80071174 0.79715302
0.80782918 0.80071174 0.76868327 0.76868327]
mean value: 0.7855986273512964
key: test_fscore
value: [0.70588235 0.58823529 0.48275862 0.62068966 0.55555556 0.66666667
0.625 0.60606061 0.66666667 0.48 ]
mean value: 0.5997515417870387
key: train_fscore
value: [0.75177305 0.7844523 0.79861111 0.78571429 0.8041958 0.79120879
0.81632653 0.79856115 0.76868327 0.76868327]
mean value: 0.7868209568429256
key: test_precision
value: [0.66666667 0.55555556 0.5 0.64285714 0.47619048 0.61111111
0.625 0.58823529 0.81818182 0.66666667]
mean value: 0.6150464731347084
key: train_precision
value: [0.74647887 0.77622378 0.78231293 0.79136691 0.79310345 0.81818182
0.77922078 0.80434783 0.76595745 0.76595745]
mean value: 0.7823151246490538
key: test_recall
value: [0.75 0.625 0.46666667 0.6 0.66666667 0.73333333
0.625 0.625 0.5625 0.375 ]
mean value: 0.6029166666666667
key: train_recall
value: [0.75714286 0.79285714 0.81560284 0.78014184 0.81560284 0.76595745
0.85714286 0.79285714 0.77142857 0.77142857]
mean value: 0.7920162107396149
key: test_roc_auc
value: [0.6875 0.5625 0.51458333 0.64375 0.48958333 0.64791667
0.6125 0.57916667 0.71458333 0.5875 ]
mean value: 0.6039583333333334
key: train_roc_auc
value: [0.75 0.78214286 0.7935157 0.78649949 0.80065856 0.79726444
0.80800405 0.80068389 0.76869301 0.76869301]
mean value: 0.7856155015197568
key: test_jcc
value: [0.54545455 0.41666667 0.31818182 0.45 0.38461538 0.5
0.45454545 0.43478261 0.5 0.31578947]
mean value: 0.4320035951843732
key: train_jcc
value: [0.60227273 0.64534884 0.66473988 0.64705882 0.67251462 0.65454545
0.68965517 0.66467066 0.62427746 0.62427746]
mean value: 0.6489361091224226
MCC on Blind test: 0.13
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01523018 0.01354218 0.01364875 0.0138216 0.0136342 0.01381826
0.01368165 0.01369882 0.01525903 0.01383352]
mean value: 0.014016819000244141
key: score_time
value: [0.00996876 0.00985384 0.00985217 0.00979042 0.00975299 0.00979042
0.0098505 0.0107069 0.00991988 0.01005197]
mean value: 0.009953784942626952
key: test_mcc
value: [0.625 0.53935989 0.42083333 0.63696156 0.50443936 0.44824996
0.35983579 0.42321607 0.63696156 0.63696156]
mean value: 0.5231819080590403
key: train_mcc
value: [0.71428571 0.73618394 0.71640396 0.71599514 0.70901046 0.73761389
0.68713898 0.73015914 0.70819191 0.72980243]
mean value: 0.7184785546282559
key: test_accuracy
value: [0.8125 0.75 0.70967742 0.80645161 0.74193548 0.70967742
0.67741935 0.70967742 0.80645161 0.80645161]
mean value: 0.7530241935483871
key: train_accuracy
value: [0.85714286 0.86785714 0.85765125 0.85765125 0.85409253 0.8683274
0.84341637 0.86476868 0.85409253 0.86476868]
mean value: 0.8589768683274022
key: test_fscore
value: [0.8125 0.78947368 0.70967742 0.82352941 0.76470588 0.74285714
0.66666667 0.74285714 0.78571429 0.78571429]
mean value: 0.7623695921492536
key: train_fscore
value: [0.85714286 0.86545455 0.86206897 0.85507246 0.85813149 0.86545455
0.84507042 0.86131387 0.85304659 0.86231884]
mean value: 0.8585074591936718
key: test_precision
value: [0.8125 0.68181818 0.6875 0.73684211 0.68421053 0.65
0.71428571 0.68421053 0.91666667 0.91666667]
mean value: 0.7484700387331966
key: train_precision
value: [0.85714286 0.88148148 0.83892617 0.87407407 0.83783784 0.8880597
0.83333333 0.88059701 0.85611511 0.875 ]
mean value: 0.8622567582697808
key: test_recall
value: [0.8125 0.9375 0.73333333 0.93333333 0.86666667 0.86666667
0.625 0.8125 0.6875 0.6875 ]
mean value: 0.79625
key: train_recall
value: [0.85714286 0.85 0.88652482 0.83687943 0.87943262 0.84397163
0.85714286 0.84285714 0.85 0.85 ]
mean value: 0.8553951367781155
key: test_roc_auc
value: [0.8125 0.75 0.71041667 0.81041667 0.74583333 0.71458333
0.67916667 0.70625 0.81041667 0.81041667]
mean value: 0.755
key: train_roc_auc
value: [0.85714286 0.86785714 0.85754813 0.85772543 0.85400203 0.86841439
0.84346505 0.86469098 0.85407801 0.86471631]
mean value: 0.8589640324214792
key: test_jcc
value: [0.68421053 0.65217391 0.55 0.7 0.61904762 0.59090909
0.5 0.59090909 0.64705882 0.64705882]
mean value: 0.6181367887283893
key: train_jcc
value: [0.75 0.76282051 0.75757576 0.74683544 0.75151515 0.76282051
0.73170732 0.75641026 0.74375 0.75796178]
mean value: 0.7521396734692827
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.16019416 1.35749841 1.18764305 1.23864603 1.30716753 1.15232587
1.29647064 1.16228747 1.28212929 1.14609051]
mean value: 1.229045295715332
key: score_time
value: [0.01337218 0.01464963 0.02091813 0.01439071 0.01440692 0.01583743
0.0146606 0.01481366 0.01492953 0.01485419]
mean value: 0.015283298492431641
key: test_mcc
value: [0.69991324 0.50395263 0.61608311 0.48954403 0.35983579 0.57104024
0.58316015 0.48527095 0.67916667 0.69203857]
mean value: 0.5680005373888458
key: train_mcc
value: [0.99288247 0.97859639 0.99290744 0.98586555 1. 0.98576494
0.9929078 0.98576494 0.9929078 0.98576494]
mean value: 0.9893362290587903
key: test_accuracy
value: [0.84375 0.75 0.80645161 0.74193548 0.67741935 0.74193548
0.77419355 0.74193548 0.83870968 0.83870968]
mean value: 0.7755040322580645
key: train_accuracy
value: [0.99642857 0.98928571 0.99644128 0.99288256 1. 0.99288256
0.99644128 0.99288256 0.99644128 0.99288256]
mean value: 0.9946568378240976
key: test_fscore
value: [0.82758621 0.76470588 0.78571429 0.75 0.6875 0.78947368
0.74074074 0.76470588 0.83870968 0.82758621]
mean value: 0.7776722566583893
key: train_fscore
value: [0.99644128 0.98924731 0.99646643 0.99285714 1. 0.9929078
0.99644128 0.99285714 0.99644128 0.99285714]
mean value: 0.9946516816329601
key: test_precision
value: [0.92307692 0.72222222 0.84615385 0.70588235 0.64705882 0.65217391
0.90909091 0.72222222 0.86666667 0.92307692]
mean value: 0.7917624802023779
key: train_precision
value: [0.9929078 0.99280576 0.99295775 1. 1. 0.9929078
0.9929078 0.99285714 0.9929078 0.99285714]
mean value: 0.9943108993262602
key: test_recall
value: [0.75 0.8125 0.73333333 0.8 0.73333333 1.
0.625 0.8125 0.8125 0.75 ]
mean value: 0.7829166666666667
key: train_recall
value: [1. 0.98571429 1. 0.9858156 1. 0.9929078
1. 0.99285714 1. 0.99285714]
mean value: 0.9950151975683891
key: test_roc_auc
value: [0.84375 0.75 0.80416667 0.74375 0.67916667 0.75
0.77916667 0.73958333 0.83958333 0.84166667]
mean value: 0.7770833333333333
key: train_roc_auc
value: [0.99642857 0.98928571 0.99642857 0.9929078 1. 0.99288247
0.9964539 0.99288247 0.9964539 0.99288247]
mean value: 0.9946605876393111
key: test_jcc
value: [0.70588235 0.61904762 0.64705882 0.6 0.52380952 0.65217391
0.58823529 0.61904762 0.72222222 0.70588235]
mean value: 0.6383359720699875
key: train_jcc
value: [0.9929078 0.9787234 0.99295775 0.9858156 1. 0.98591549
0.9929078 0.9858156 0.9929078 0.9858156 ]
mean value: 0.9893766856457896
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02894592 0.0212357 0.02063727 0.02075434 0.02135015 0.01837444
0.02223134 0.02115107 0.01944661 0.02165294]
mean value: 0.021577978134155275
key: score_time
value: [0.0115757 0.00920677 0.00933361 0.00870323 0.00865054 0.00864983
0.00901675 0.00935531 0.00854015 0.00858498]
mean value: 0.009161686897277832
key: test_mcc
value: [0.62994079 0.68884672 0.67916667 0.67916667 0.48527095 0.63696156
0.53006813 0.22364661 0.5612264 0.55 ]
mean value: 0.5664294488138983
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.84375 0.83870968 0.83870968 0.74193548 0.80645161
0.74193548 0.61290323 0.77419355 0.77419355]
mean value: 0.7785282258064516
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.84848485 0.83870968 0.83870968 0.71428571 0.82352941
0.69230769 0.64705882 0.75862069 0.77419355]
mean value: 0.7759429495018058
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.82352941 0.8125 0.8125 0.76923077 0.73684211
0.9 0.61111111 0.84615385 0.8 ]
mean value: 0.7889645021301368
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 0.86666667 0.86666667 0.66666667 0.93333333
0.5625 0.6875 0.6875 0.75 ]
mean value: 0.7770833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.84375 0.83958333 0.83958333 0.73958333 0.81041667
0.74791667 0.61041667 0.77708333 0.775 ]
mean value: 0.7795833333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.73684211 0.72222222 0.72222222 0.55555556 0.7
0.52941176 0.47826087 0.61111111 0.63157895]
mean value: 0.638720479801379
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10602975 0.10733938 0.10524845 0.10616469 0.1080482 0.10535192
0.10625601 0.10542488 0.10674691 0.10701108]
mean value: 0.10636212825775146
key: score_time
value: [0.01797223 0.01859069 0.01800942 0.01755714 0.01740718 0.01746559
0.01737189 0.01737976 0.01745605 0.01783395]
mean value: 0.01770439147949219
key: test_mcc
value: [0.56360186 0.72374686 0.54812195 0.69203857 0.44824996 0.4770843
0.5612264 0.49612132 0.61925228 0.57104024]
mean value: 0.5700483750913896
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78125 0.84375 0.77419355 0.83870968 0.70967742 0.70967742
0.77419355 0.74193548 0.80645161 0.74193548]
mean value: 0.7721774193548387
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.78787879 0.86486486 0.75862069 0.84848485 0.74285714 0.75675676
0.75862069 0.77777778 0.8 0.66666667]
mean value: 0.7762528224597189
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76470588 0.76190476 0.78571429 0.77777778 0.65 0.63636364
0.84615385 0.7 0.85714286 1. ]
mean value: 0.7779763047410106
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 1. 0.73333333 0.93333333 0.86666667 0.93333333
0.6875 0.875 0.75 0.5 ]
mean value: 0.8091666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78125 0.84375 0.77291667 0.84166667 0.71458333 0.71666667
0.77708333 0.7375 0.80833333 0.75 ]
mean value: 0.774375
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.65 0.76190476 0.61111111 0.73684211 0.59090909 0.60869565
0.61111111 0.63636364 0.66666667 0.5 ]
mean value: 0.6373604135503449
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.66
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01085496 0.00925255 0.01008058 0.00925827 0.00935721 0.00931215
0.0092001 0.0097034 0.00947332 0.01037645]
mean value: 0.009686899185180665
key: score_time
value: [0.00925589 0.00849748 0.0090673 0.00847626 0.00856829 0.00894523
0.00853539 0.00952673 0.00882292 0.00931621]
mean value: 0.008901166915893554
key: test_mcc
value: [0.18786729 0.46056619 0.29069387 0.225 0.42083333 0.23012754
0.25389818 0.55573827 0.29166667 0.58316015]
mean value: 0.349955147894385
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.59375 0.71875 0.64516129 0.61290323 0.70967742 0.61290323
0.61290323 0.77419355 0.64516129 0.77419355]
mean value: 0.6699596774193548
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.58064516 0.66666667 0.59259259 0.6 0.70967742 0.625
0.53846154 0.8 0.64516129 0.74074074]
mean value: 0.649894540942928
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.81818182 0.66666667 0.6 0.6875 0.58823529
0.7 0.73684211 0.66666667 0.90909091]
mean value: 0.6973183459986866
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5625 0.5625 0.53333333 0.6 0.73333333 0.66666667
0.4375 0.875 0.625 0.625 ]
mean value: 0.6220833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.59375 0.71875 0.64166667 0.6125 0.71041667 0.61458333
0.61875 0.77083333 0.64583333 0.77916667]
mean value: 0.670625
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.40909091 0.5 0.42105263 0.42857143 0.55 0.45454545
0.36842105 0.66666667 0.47619048 0.58823529]
mean value: 0.48627739133931086
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.55
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.45871091 1.44358611 1.44652629 1.49799991 1.45427561 1.44756174
1.44085765 1.46252918 1.51522851 1.45094728]
mean value: 1.4618223190307618
key: score_time
value: [0.09098411 0.09088707 0.09599185 0.09621763 0.09140348 0.0917809
0.0976913 0.09258533 0.09712076 0.0905304 ]
mean value: 0.09351928234100342
key: test_mcc
value: [0.625 0.75 0.74166667 0.6125 0.29960206 0.53006813
0.50443936 0.48527095 0.69203857 0.63696156]
mean value: 0.5877547290151389
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.875 0.87096774 0.80645161 0.64516129 0.74193548
0.74193548 0.74193548 0.83870968 0.80645161]
mean value: 0.7881048387096774
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8125 0.875 0.86666667 0.8 0.66666667 0.77777778
0.71428571 0.76470588 0.82758621 0.78571429]
mean value: 0.7890903200360604
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8125 0.875 0.86666667 0.8 0.61111111 0.66666667
0.83333333 0.72222222 0.92307692 0.91666667]
mean value: 0.802724358974359
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8125 0.875 0.86666667 0.8 0.73333333 0.93333333
0.625 0.8125 0.75 0.6875 ]
mean value: 0.7895833333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.875 0.87083333 0.80625 0.64791667 0.74791667
0.74583333 0.73958333 0.84166667 0.81041667]
mean value: 0.7897916666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.68421053 0.77777778 0.76470588 0.66666667 0.5 0.63636364
0.55555556 0.61904762 0.70588235 0.64705882]
mean value: 0.6557268840550574
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.72
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87758327 0.92826319 0.93050742 0.84618402 0.95714593 0.92273808
0.95817399 0.91892672 0.88999343 0.88455796]
mean value: 0.9114073991775513
key: score_time
value: [0.21835947 0.24447083 0.26226187 0.22112417 0.26000118 0.18407679
0.26429701 0.2288034 0.23047781 0.22172642]
mean value: 0.23355989456176757
key: test_mcc
value: [0.68884672 0.875 0.80753845 0.61925228 0.50443936 0.42352151
0.61925228 0.4184137 0.69203857 0.58316015]
mean value: 0.6231463022247349
key: train_mcc
value: [0.90009185 0.91428571 0.87919331 0.90767208 0.8934327 0.91458967
0.89344886 0.89326241 0.91458967 0.9219233 ]
mean value: 0.9032489559187864
key: test_accuracy
value: [0.84375 0.9375 0.90322581 0.80645161 0.74193548 0.67741935
0.80645161 0.70967742 0.83870968 0.77419355]
mean value: 0.8039314516129032
key: train_accuracy
value: [0.95 0.95714286 0.93950178 0.95373665 0.94661922 0.95729537
0.94661922 0.94661922 0.95729537 0.96085409]
mean value: 0.9515683782409761
key: test_fscore
value: [0.83870968 0.9375 0.89655172 0.8125 0.76470588 0.73684211
0.8 0.72727273 0.82758621 0.74074074]
mean value: 0.8082409064083405
key: train_fscore
value: [0.95035461 0.95714286 0.94035088 0.95438596 0.94736842 0.95744681
0.94699647 0.94661922 0.95714286 0.96113074]
mean value: 0.9518938821445742
key: test_precision
value: [0.86666667 0.9375 0.92857143 0.76470588 0.68421053 0.60869565
0.85714286 0.70588235 0.92307692 0.90909091]
mean value: 0.8185543198332604
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: train_precision
value: [0.94366197 0.95714286 0.93055556 0.94444444 0.9375 0.95744681
0.93706294 0.94326241 0.95714286 0.95104895]
mean value: 0.9459268794086745
key: test_recall
value: [0.8125 0.9375 0.86666667 0.86666667 0.86666667 0.93333333
0.75 0.75 0.75 0.625 ]
mean value: 0.8158333333333333
key: train_recall
value: [0.95714286 0.95714286 0.95035461 0.96453901 0.95744681 0.95744681
0.95714286 0.95 0.95714286 0.97142857]
mean value: 0.9579787234042554
key: test_roc_auc
value: [0.84375 0.9375 0.90208333 0.80833333 0.74583333 0.68541667
0.80833333 0.70833333 0.84166667 0.77916667]
mean value: 0.8060416666666667
key: train_roc_auc
value: [0.95 0.95714286 0.93946302 0.95369807 0.94658055 0.95729483
0.94665653 0.94663121 0.95729483 0.96089159]
mean value: 0.951565349544073
key: test_jcc
value: [0.72222222 0.88235294 0.8125 0.68421053 0.61904762 0.58333333
0.66666667 0.57142857 0.70588235 0.58823529]
mean value: 0.6835879527249497
key: train_jcc
value: [0.90540541 0.91780822 0.88741722 0.91275168 0.9 0.91836735
0.89932886 0.89864865 0.91780822 0.92517007]
mean value: 0.9082705662832002
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02361965 0.00960827 0.01067877 0.00956154 0.01037812 0.01068234
0.0104816 0.00967288 0.00952721 0.0107491 ]
mean value: 0.01149594783782959
key: score_time
value: [0.01264954 0.00892997 0.0097971 0.00886059 0.00994897 0.0096302
0.00885224 0.00893092 0.00966692 0.01322079]
mean value: 0.010048723220825196
key: test_mcc
value: [0.56360186 0.48653363 0.15899721 0.5612264 0.4770843 0.37191715
0.28870546 0.42321607 0.09583333 0.58316015]
mean value: 0.40102755595919914
key: train_mcc
value: [0.57430732 0.53803891 0.55459753 0.49900055 0.55953199 0.56745262
0.55493536 0.58009119 0.54153517 0.53306083]
mean value: 0.5502551476643553
key: test_accuracy
value: [0.78125 0.71875 0.58064516 0.77419355 0.70967742 0.67741935
0.64516129 0.70967742 0.5483871 0.77419355]
mean value: 0.6919354838709677
key: train_accuracy
value: [0.78571429 0.76785714 0.77580071 0.74733096 0.77935943 0.78291815
0.77580071 0.79003559 0.76868327 0.76512456]
mean value: 0.7738624809354346
key: test_fscore
value: [0.77419355 0.76923077 0.55172414 0.78787879 0.75675676 0.70588235
0.66666667 0.74285714 0.5625 0.74074074]
mean value: 0.7058430903390172
key: train_fscore
value: [0.79591837 0.778157 0.78787879 0.7641196 0.7862069 0.79180887
0.78644068 0.79003559 0.78114478 0.7755102 ]
mean value: 0.7837220773794649
key: test_precision
value: [0.8 0.65217391 0.57142857 0.72222222 0.63636364 0.63157895
0.64705882 0.68421053 0.5625 0.90909091]
mean value: 0.681662754936244
key: train_precision
value: [0.75974026 0.74509804 0.75 0.71875 0.76510067 0.76315789
0.7483871 0.78723404 0.7388535 0.74025974]
mean value: 0.7516581247605566
key: test_recall
value: [0.75 0.9375 0.53333333 0.86666667 0.93333333 0.8
0.6875 0.8125 0.5625 0.625 ]
mean value: 0.7508333333333334
key: train_recall
value: [0.83571429 0.81428571 0.82978723 0.81560284 0.80851064 0.82269504
0.82857143 0.79285714 0.82857143 0.81428571]
mean value: 0.8190881458966566
key: test_roc_auc
value: [0.78125 0.71875 0.57916667 0.77708333 0.71666667 0.68125
0.64375 0.70625 0.54791667 0.77916667]
mean value: 0.693125
key: train_roc_auc
value: [0.78571429 0.76785714 0.7756079 0.74708713 0.77925532 0.78277609
0.77598784 0.79004559 0.76889564 0.76529889]
mean value: 0.7738525835866261
key: test_jcc
value: [0.63157895 0.625 0.38095238 0.65 0.60869565 0.54545455
0.5 0.59090909 0.39130435 0.58823529]
mean value: 0.5512130258802086
key: train_jcc
value: [0.66101695 0.63687151 0.65 0.61827957 0.64772727 0.65536723
0.64804469 0.65294118 0.64088398 0.63333333]
mean value: 0.6444465712232499
MCC on Blind test: 0.35
Accuracy on Blind test: 0.68
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.24555159 0.0699985 0.06857967 0.06968999 0.07746792 0.06782889
0.06897974 0.07231164 0.06659174 0.06663275]
mean value: 0.08736324310302734
key: score_time
value: [0.01073599 0.01065278 0.01036191 0.01028013 0.01068068 0.0102334
0.01025152 0.01063037 0.01021504 0.01019859]
mean value: 0.010424041748046875
key: test_mcc
value: [0.75 0.81409158 0.87083333 0.87083333 0.4184137 0.6681531
0.52291252 0.55 0.67916667 0.67916667]
mean value: 0.6823570904153414
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.90625 0.93548387 0.93548387 0.70967742 0.80645161
0.70967742 0.77419355 0.83870968 0.83870968]
mean value: 0.8329637096774194
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.875 0.90909091 0.93333333 0.93333333 0.68965517 0.83333333
0.60869565 0.77419355 0.83870968 0.83870968]
mean value: 0.8234054636904422
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.875 0.88235294 0.93333333 0.93333333 0.71428571 0.71428571
1. 0.8 0.86666667 0.86666667]
mean value: 0.8585924369747899
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.9375 0.93333333 0.93333333 0.66666667 1.
0.4375 0.75 0.8125 0.8125 ]
mean value: 0.8158333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.90625 0.93541667 0.93541667 0.70833333 0.8125
0.71875 0.775 0.83958333 0.83958333]
mean value: 0.8345833333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.77777778 0.83333333 0.875 0.875 0.52631579 0.71428571
0.4375 0.63157895 0.72222222 0.72222222]
mean value: 0.7115236006683375
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.02469635 0.03099465 0.03220057 0.05974889 0.03210664 0.04455256
0.06403112 0.06385612 0.06735516 0.05560327]
mean value: 0.04751453399658203
key: score_time
value: [0.01184535 0.01187634 0.01188827 0.02012181 0.01192713 0.02099824
0.0206399 0.02027392 0.02046394 0.01186919]
mean value: 0.016190409660339355
key: test_mcc
value: [0.57265629 0.62994079 0.4365267 0.43041423 0.16878989 0.39198315
0.29960206 0.48333333 0.43041423 0.74896053]
mean value: 0.45926212084224455
key: train_mcc
value: [0.84294316 0.8653681 0.85764944 0.8507372 0.87902123 0.8718845
0.8718845 0.87919331 0.85071454 0.87921164]
mean value: 0.8648607631443651
key: test_accuracy
value: [0.78125 0.8125 0.70967742 0.70967742 0.58064516 0.67741935
0.64516129 0.74193548 0.70967742 0.87096774]
mean value: 0.723891129032258
key: train_accuracy
value: [0.92142857 0.93214286 0.92882562 0.9252669 0.93950178 0.93594306
0.93594306 0.93950178 0.9252669 0.93950178]
mean value: 0.9323322318251144
key: test_fscore
value: [0.75862069 0.82352941 0.64 0.72727273 0.60606061 0.72222222
0.62068966 0.75 0.68965517 0.86666667]
mean value: 0.7204717151228307
key: train_fscore
value: [0.92086331 0.93040293 0.92907801 0.92473118 0.93992933 0.93617021
0.93571429 0.93862816 0.92418773 0.93992933]
mean value: 0.9319634476936138
key: test_precision
value: [0.84615385 0.77777778 0.8 0.66666667 0.55555556 0.61904762
0.69230769 0.75 0.76923077 0.92857143]
mean value: 0.7405311355311356
key: train_precision
value: [0.92753623 0.95488722 0.92907801 0.93478261 0.93661972 0.93617021
0.93571429 0.94890511 0.93430657 0.93006993]
mean value: 0.9368069898501369
key: test_recall
value: [0.6875 0.875 0.53333333 0.8 0.66666667 0.86666667
0.5625 0.75 0.625 0.8125 ]
mean value: 0.7179166666666666
key: train_recall
value: [0.91428571 0.90714286 0.92907801 0.91489362 0.94326241 0.93617021
0.93571429 0.92857143 0.91428571 0.95 ]
mean value: 0.9273404255319149
key: test_roc_auc
value: [0.78125 0.8125 0.70416667 0.7125 0.58333333 0.68333333
0.64791667 0.74166667 0.7125 0.87291667]
mean value: 0.7252083333333333
key: train_roc_auc
value: [0.92142857 0.93214286 0.92882472 0.92530395 0.93948835 0.93594225
0.93594225 0.93946302 0.92522796 0.93953901]
mean value: 0.9323302938196555
key: test_jcc
value: [0.61111111 0.7 0.47058824 0.57142857 0.43478261 0.56521739
0.45 0.6 0.52631579 0.76470588]
mean value: 0.5694149589660426
key: train_jcc
value: [0.85333333 0.86986301 0.86754967 0.86 0.88666667 0.88
0.87919463 0.88435374 0.8590604 0.88666667]
mean value: 0.8726688124293115
MCC on Blind test: 0.36
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01310349 0.00920939 0.00916791 0.00908709 0.00897574 0.00889897
0.0089469 0.00908518 0.00897837 0.0089066 ]
mean value: 0.00943596363067627
key: score_time
value: [0.0314641 0.00882173 0.00867224 0.00837922 0.00839758 0.0084486
0.0083878 0.00838399 0.00836039 0.00852084]
mean value: 0.010783648490905762
key: test_mcc
value: [ 0.46056619 0.59215653 0.35445878 0.5612264 -0.02227177 0.16878989
0.28870546 0.48954403 0.48333333 0.82285074]
mean value: 0.419935957325999
key: train_mcc
value: [0.43760498 0.45195269 0.49730798 0.46185564 0.48917077 0.48864808
0.47585629 0.46906706 0.45260942 0.44044429]
mean value: 0.46645171945247466
key: test_accuracy
value: [0.71875 0.78125 0.67741935 0.77419355 0.48387097 0.58064516
0.64516129 0.74193548 0.74193548 0.90322581]
mean value: 0.7048387096774194
key: train_accuracy
value: [0.71785714 0.725 0.74733096 0.72953737 0.74377224 0.74377224
0.7366548 0.73309609 0.72597865 0.71886121]
mean value: 0.7321860701576004
key: test_fscore
value: [0.75675676 0.81081081 0.64285714 0.78787879 0.55555556 0.60606061
0.66666667 0.73333333 0.75 0.89655172]
mean value: 0.7206471384057591
key: train_fscore
value: [0.73037543 0.73720137 0.76094276 0.74496644 0.75510204 0.75342466
0.74829932 0.74576271 0.73170732 0.73220339]
mean value: 0.7439985432551205
key: test_precision
value: [0.66666667 0.71428571 0.69230769 0.72222222 0.47619048 0.55555556
0.64705882 0.78571429 0.75 1. ]
mean value: 0.7010001436472024
key: train_precision
value: [0.69934641 0.70588235 0.72435897 0.70700637 0.7254902 0.72847682
0.71428571 0.70967742 0.71428571 0.69677419]
mean value: 0.71255841607008
key: test_recall
value: [0.875 0.9375 0.6 0.86666667 0.66666667 0.66666667
0.6875 0.6875 0.75 0.8125 ]
mean value: 0.755
key: train_recall
value: [0.76428571 0.77142857 0.80141844 0.78723404 0.78723404 0.78014184
0.78571429 0.78571429 0.75 0.77142857]
mean value: 0.7784599797365754
key: test_roc_auc
value: [0.71875 0.78125 0.675 0.77708333 0.48958333 0.58333333
0.64375 0.74375 0.74166667 0.90625 ]
mean value: 0.7060416666666667
key: train_roc_auc
value: [0.71785714 0.725 0.74713779 0.72933131 0.74361702 0.74364235
0.73682877 0.73328267 0.72606383 0.71904762]
mean value: 0.7321808510638298
key: test_jcc
value: [0.60869565 0.68181818 0.47368421 0.65 0.38461538 0.43478261
0.5 0.57894737 0.6 0.8125 ]
mean value: 0.57250434062505
key: train_jcc
value: [0.57526882 0.58378378 0.61413043 0.59358289 0.60655738 0.6043956
0.59782609 0.59459459 0.57692308 0.57754011]
mean value: 0.5924602770342078
MCC on Blind test: 0.31
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01268196 0.01671743 0.01599097 0.01547456 0.01932931 0.01919198
0.01620913 0.01577759 0.01730418 0.01631546]
mean value: 0.016499257087707518
key: score_time
value: [0.00838947 0.01079845 0.01083779 0.0114007 0.01187229 0.01147294
0.01136541 0.011338 0.01137733 0.01140523]
mean value: 0.011025762557983399
key: test_mcc
value: [0.50395263 0.68884672 0.4184137 0.18856181 0.55 0.35416667
0.57104024 0.54812195 0.57461167 0.4770843 ]
mean value: 0.48747996920218695
key: train_mcc
value: [0.66800328 0.72514339 0.73132112 0.3380284 0.79461478 0.7291921
0.67477868 0.75091185 0.59031555 0.73396841]
mean value: 0.673627756111491
key: test_accuracy
value: [0.75 0.84375 0.70967742 0.5483871 0.77419355 0.67741935
0.74193548 0.77419355 0.77419355 0.70967742]
mean value: 0.7303427419354839
key: train_accuracy
value: [0.81785714 0.85357143 0.86476868 0.60142349 0.89679715 0.84697509
0.82562278 0.87544484 0.75800712 0.86120996]
mean value: 0.8201677681748856
key: test_fscore
value: [0.76470588 0.83870968 0.68965517 0.125 0.77419355 0.66666667
0.66666667 0.78787879 0.81081081 0.64 ]
mean value: 0.6764287212596118
key: train_fscore
value: [0.84210526 0.83534137 0.86986301 0.34117647 0.89454545 0.82008368
0.79835391 0.87544484 0.8045977 0.84705882]
mean value: 0.792857052346194
key: test_precision
value: [0.72222222 0.86666667 0.71428571 1. 0.75 0.66666667
1. 0.76470588 0.71428571 0.88888889]
mean value: 0.8087721755368814
key: train_precision
value: [0.7431694 0.95412844 0.8410596 1. 0.91791045 1.
0.94174757 0.87234043 0.67307692 0.93913043]
mean value: 0.8882563245891257
key: test_recall
value: [0.8125 0.8125 0.66666667 0.06666667 0.8 0.66666667
0.5 0.8125 0.9375 0.5 ]
mean value: 0.6575
key: train_recall
value: [0.97142857 0.74285714 0.90070922 0.20567376 0.87234043 0.69503546
0.69285714 0.87857143 1. 0.77142857]
mean value: 0.7730901722391084
key: test_roc_auc
value: [0.75 0.84375 0.70833333 0.53333333 0.775 0.67708333
0.75 0.77291667 0.76875 0.71666667]
mean value: 0.7295833333333334
key: train_roc_auc
value: [0.81785714 0.85357143 0.86464032 0.60283688 0.8968845 0.84751773
0.82515198 0.87545593 0.75886525 0.86089159]
mean value: 0.8203672745694023
key: test_jcc
value: [0.61904762 0.72222222 0.52631579 0.06666667 0.63157895 0.5
0.5 0.65 0.68181818 0.47058824]
mean value: 0.5368237661890912
key: train_jcc
value: [0.72727273 0.71724138 0.76969697 0.20567376 0.80921053 0.69503546
0.66438356 0.77848101 0.67307692 0.73469388]
mean value: 0.6774766197383995
MCC on Blind test: 0.33
Accuracy on Blind test: 0.66
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01721334 0.01558971 0.0196166 0.01636291 0.01849246 0.01584291
0.01977682 0.01872659 0.01812482 0.01565337]
mean value: 0.01753995418548584
key: score_time
value: [0.01143718 0.01138806 0.0113678 0.01132464 0.01139116 0.01137233
0.01146889 0.01145911 0.01139402 0.01143646]
mean value: 0.011403965950012206
key: test_mcc
value: [0.48038446 0.59215653 0.37191715 0.63696156 0.54812195 0.42460389
0.42352151 0.53006813 0.53006813 0.53006813]
mean value: 0.5067871441548388
key: train_mcc
value: [0.46056619 0.75017225 0.64420772 0.7197701 0.70326066 0.67363597
0.72865591 0.71629725 0.75766296 0.73396841]
mean value: 0.6888197434733603
key: test_accuracy
value: [0.6875 0.78125 0.67741935 0.80645161 0.77419355 0.64516129
0.67741935 0.74193548 0.74193548 0.74193548]
mean value: 0.7275201612903226
key: train_accuracy
value: [0.675 0.875 0.79359431 0.85053381 0.84341637 0.81494662
0.84697509 0.84697509 0.8683274 0.86120996]
mean value: 0.8275978647686832
key: test_fscore
value: [0.76190476 0.81081081 0.70588235 0.82352941 0.75862069 0.73170732
0.58333333 0.69230769 0.69230769 0.69230769]
mean value: 0.7252711754406208
key: train_fscore
value: [0.75471698 0.87364621 0.82941176 0.86624204 0.82539683 0.84337349
0.8185654 0.8244898 0.85020243 0.84705882]
mean value: 0.8333103762254988
key: test_precision
value: [0.61538462 0.71428571 0.63157895 0.73684211 0.78571429 0.57692308
0.875 0.9 0.9 0.9 ]
mean value: 0.7635728744939271
key: train_precision
value: [0.60606061 0.88321168 0.70854271 0.78612717 0.93693694 0.73298429
1. 0.96190476 0.98130841 0.93913043]
mean value: 0.8536207004123598
key: test_recall
value: [1. 0.9375 0.8 0.93333333 0.73333333 1.
0.4375 0.5625 0.5625 0.5625 ]
mean value: 0.7529166666666667
key: train_recall
value: [1. 0.86428571 1. 0.96453901 0.73758865 0.9929078
0.69285714 0.72142857 0.75 0.77142857]
mean value: 0.8495035460992908
key: test_roc_auc
value: [0.6875 0.78125 0.68125 0.81041667 0.77291667 0.65625
0.68541667 0.74791667 0.74791667 0.74791667]
mean value: 0.731875
key: train_roc_auc
value: [0.675 0.875 0.79285714 0.85012665 0.84379433 0.81431104
0.84642857 0.84652989 0.8679078 0.86089159]
mean value: 0.8272847011144884
key: test_jcc
value: [0.61538462 0.68181818 0.54545455 0.7 0.61111111 0.57692308
0.41176471 0.52941176 0.52941176 0.52941176]
mean value: 0.5730691530691531
key: train_jcc
value: [0.60606061 0.77564103 0.70854271 0.76404494 0.7027027 0.72916667
0.69285714 0.70138889 0.73943662 0.73469388]
mean value: 0.7154535187474427
MCC on Blind test: 0.41
Accuracy on Blind test: 0.68
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.14774513 0.12925386 0.12936115 0.13003302 0.13002491 0.13006401
0.13567472 0.1359446 0.13547182 0.14191341]
mean value: 0.13454866409301758
key: score_time
value: [0.01482654 0.01483226 0.01495528 0.01483512 0.01467562 0.01515245
0.02102137 0.01637673 0.01544118 0.01475692]
mean value: 0.01568734645843506
key: test_mcc
value: [0.75592895 0.625 0.55 0.6310315 0.6125 0.5612264
0.33487648 0.69203857 0.48954403 0.63696156]
mean value: 0.588910748855813
key: train_mcc
value: [0.97859639 0.97859639 0.9929078 0.97162977 0.9929078 0.9929078
0.99290744 0.97867167 0.9929078 1. ]
mean value: 0.9872032871447398
key: test_accuracy
value: [0.875 0.8125 0.77419355 0.80645161 0.80645161 0.77419355
0.64516129 0.83870968 0.74193548 0.80645161]
mean value: 0.7881048387096774
key: train_accuracy
value: [0.98928571 0.98928571 0.99644128 0.98576512 0.99644128 0.99644128
0.99644128 0.98932384 0.99644128 1. ]
mean value: 0.9935866802236909
key: test_fscore
value: [0.88235294 0.8125 0.77419355 0.76923077 0.8 0.78787879
0.56 0.82758621 0.73333333 0.78571429]
mean value: 0.7732789872617295
key: train_fscore
value: [0.98932384 0.98932384 0.99644128 0.98571429 0.99644128 0.99644128
0.99641577 0.98924731 0.99644128 1. ]
mean value: 0.9935790179539462
key: test_precision
value: [0.83333333 0.8125 0.75 0.90909091 0.8 0.72222222
0.77777778 0.92307692 0.78571429 0.91666667]
mean value: 0.8230382117882118
key: train_precision
value: [0.9858156 0.9858156 1. 0.99280576 1. 1.
1. 0.99280576 0.9929078 1. ]
mean value: 0.9950150517883566
key: test_recall
value: [0.9375 0.8125 0.8 0.66666667 0.8 0.86666667
0.4375 0.75 0.6875 0.6875 ]
mean value: 0.7445833333333334
key: train_recall
value: [0.99285714 0.99285714 0.9929078 0.9787234 0.9929078 0.9929078
0.99285714 0.98571429 1. 1. ]
mean value: 0.9921732522796353
key: test_roc_auc
value: [0.875 0.8125 0.775 0.80208333 0.80625 0.77708333
0.65208333 0.84166667 0.74375 0.81041667]
mean value: 0.7895833333333333
key: train_roc_auc
value: [0.98928571 0.98928571 0.9964539 0.98579027 0.9964539 0.9964539
0.99642857 0.98931104 0.9964539 1. ]
mean value: 0.9935916919959473
key: test_jcc
value: [0.78947368 0.68421053 0.63157895 0.625 0.66666667 0.65
0.38888889 0.70588235 0.57894737 0.64705882]
mean value: 0.6367707258341934
key: train_jcc
value: [0.97887324 0.97887324 0.9929078 0.97183099 0.9929078 0.9929078
0.99285714 0.9787234 0.9929078 1. ]
mean value: 0.9872789217574953
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05547452 0.05822182 0.05548573 0.05391788 0.05418348 0.04964733
0.05522466 0.07143283 0.07606363 0.05446839]
mean value: 0.058412027359008786
key: score_time
value: [0.02311087 0.02655649 0.01827073 0.01858497 0.02204895 0.02504039
0.03309536 0.02989912 0.03761315 0.02283883]
mean value: 0.025705885887145997
key: test_mcc
value: [0.75592895 0.75592895 0.6778302 0.80753845 0.48527095 0.6681531
0.42352151 0.55 0.69203857 0.61925228]
mean value: 0.6435462955940828
key: train_mcc
value: [0.97182532 0.93688613 0.97192667 0.94460323 0.97162977 0.97867167
0.96501929 0.98586412 0.978869 0.95816272]
mean value: 0.9663457912208626
key: test_accuracy
value: [0.875 0.875 0.83870968 0.90322581 0.74193548 0.80645161
0.67741935 0.77419355 0.83870968 0.80645161]
mean value: 0.8137096774193548
key: train_accuracy
value: [0.98571429 0.96785714 0.98576512 0.97153025 0.98576512 0.98932384
0.98220641 0.99288256 0.98932384 0.97864769]
mean value: 0.982901626842908
key: test_fscore
value: [0.86666667 0.86666667 0.82758621 0.89655172 0.71428571 0.83333333
0.58333333 0.77419355 0.82758621 0.8 ]
mean value: 0.7990203400603846
key: train_fscore
value: [0.98550725 0.96703297 0.98561151 0.97080292 0.98571429 0.98939929
0.98181818 0.99280576 0.98916968 0.97810219]
mean value: 0.982596402499482
key: test_precision
value: [0.92857143 0.92857143 0.85714286 0.92857143 0.76923077 0.71428571
0.875 0.8 0.92307692 0.85714286]
mean value: 0.8581593406593406
key: train_precision
value: [1. 0.9924812 1. 1. 0.99280576 0.98591549
1. 1. 1. 1. ]
mean value: 0.9971202451360949
key: test_recall
value: [0.8125 0.8125 0.8 0.86666667 0.66666667 1.
0.4375 0.75 0.75 0.75 ]
mean value: 0.7645833333333334
key: train_recall
value: [0.97142857 0.94285714 0.97163121 0.94326241 0.9787234 0.9929078
0.96428571 0.98571429 0.97857143 0.95714286]
mean value: 0.9686524822695035
key: test_roc_auc
value: [0.875 0.875 0.8375 0.90208333 0.73958333 0.8125
0.68541667 0.775 0.84166667 0.80833333]
mean value: 0.8152083333333333
key: train_roc_auc
value: [0.98571429 0.96785714 0.9858156 0.97163121 0.98579027 0.98931104
0.98214286 0.99285714 0.98928571 0.97857143]
mean value: 0.9828976697061804
key: test_jcc
value: [0.76470588 0.76470588 0.70588235 0.8125 0.55555556 0.71428571
0.41176471 0.63157895 0.70588235 0.66666667]
mean value: 0.6733528060346946
key: train_jcc
value: [0.97142857 0.93617021 0.97163121 0.94326241 0.97183099 0.97902098
0.96428571 0.98571429 0.97857143 0.95714286]
mean value: 0.9659058651866563
MCC on Blind test: 0.49
Accuracy on Blind test: 0.74
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.07153654 0.08550692 0.08657861 0.0378139 0.0400362 0.10264516
0.09579253 0.07016683 0.03786349 0.03786325]
mean value: 0.06658034324645996
key: score_time
value: [0.02121544 0.02933049 0.01310635 0.0131979 0.01302719 0.02128839
0.02968192 0.01325464 0.01323366 0.01954889]
mean value: 0.018688488006591796
key: test_mcc
value: [0.56360186 0.31311215 0.48333333 0.5612264 0.39198315 0.43041423
0.48954403 0.29069387 0.61925228 0.57104024]
mean value: 0.4714201549672476
key: train_mcc
value: [0.98571429 0.98571429 0.99290744 0.98576494 0.98576494 0.9929078
0.98576494 0.98576494 0.98576494 0.98576494]
mean value: 0.9871833481894341
key: test_accuracy
value: [0.78125 0.65625 0.74193548 0.77419355 0.67741935 0.70967742
0.74193548 0.64516129 0.80645161 0.74193548]
mean value: 0.7276209677419355
key: train_accuracy
value: [0.99285714 0.99285714 0.99644128 0.99288256 0.99288256 0.99644128
0.99288256 0.99288256 0.99288256 0.99288256]
mean value: 0.9935892221657346
key: test_fscore
value: [0.78787879 0.66666667 0.73333333 0.78787879 0.72222222 0.72727273
0.73333333 0.68571429 0.8 0.66666667]
mean value: 0.731096681096681
key: train_fscore
value: [0.99285714 0.99285714 0.99646643 0.9929078 0.9929078 0.99644128
0.99285714 0.99285714 0.99285714 0.99285714]
mean value: 0.9935866172213933
key: test_precision
value: [0.76470588 0.64705882 0.73333333 0.72222222 0.61904762 0.66666667
0.78571429 0.63157895 0.85714286 1. ]
mean value: 0.7427470637377758
key: train_precision
value: [0.99285714 0.99285714 0.99295775 0.9929078 0.9929078 1.
0.99285714 0.99285714 0.99285714 0.99285714]
mean value: 0.993591620645861
key: test_recall
value: [0.8125 0.6875 0.73333333 0.86666667 0.86666667 0.8
0.6875 0.75 0.75 0.5 ]
mean value: 0.7454166666666666
key: train_recall
value: [0.99285714 0.99285714 1. 0.9929078 0.9929078 0.9929078
0.99285714 0.99285714 0.99285714 0.99285714]
mean value: 0.9935866261398176
key: test_roc_auc
value: [0.78125 0.65625 0.74166667 0.77708333 0.68333333 0.7125
0.74375 0.64166667 0.80833333 0.75 ]
mean value: 0.7295833333333334
key: train_roc_auc
value: [0.99285714 0.99285714 0.99642857 0.99288247 0.99288247 0.9964539
0.99288247 0.99288247 0.99288247 0.99288247]
mean value: 0.9935891590678825
key: test_jcc
value: [0.65 0.5 0.57894737 0.65 0.56521739 0.57142857
0.57894737 0.52173913 0.66666667 0.5 ]
mean value: 0.5782946496676473
key: train_jcc
value: [0.9858156 0.9858156 0.99295775 0.98591549 0.98591549 0.9929078
0.9858156 0.9858156 0.9858156 0.9858156 ]
mean value: 0.9872590150834082
MCC on Blind test: 0.27
Accuracy on Blind test: 0.64
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.48413157 0.47434545 0.47634721 0.47761345 0.47614884 0.47844601
0.47672391 0.47568178 0.47936511 0.48416018]
mean value: 0.4782963514328003
key: score_time
value: [0.00920129 0.00912952 0.0093317 0.0096333 0.00975752 0.00908947
0.00899506 0.00906134 0.00970078 0.00986767]
mean value: 0.009376764297485352
key: test_mcc
value: [0.62994079 0.875 0.74896053 0.80753845 0.4184137 0.69203857
0.4770843 0.61925228 0.69203857 0.63696156]
mean value: 0.6597228748980087
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8125 0.9375 0.87096774 0.90322581 0.70967742 0.83870968
0.70967742 0.80645161 0.83870968 0.80645161]
mean value: 0.8233870967741935
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.9375 0.875 0.89655172 0.68965517 0.84848485
0.64 0.8 0.82758621 0.78571429]
mean value: 0.810049223764741
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.9375 0.82352941 0.92857143 0.71428571 0.77777778
0.88888889 0.85714286 0.92307692 0.91666667]
mean value: 0.8624582525317819
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.9375 0.93333333 0.86666667 0.66666667 0.93333333
0.5 0.75 0.75 0.6875 ]
mean value: 0.7775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.9375 0.87291667 0.90208333 0.70833333 0.84166667
0.71666667 0.80833333 0.84166667 0.81041667]
mean value: 0.8252083333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.88235294 0.77777778 0.8125 0.52631579 0.73684211
0.47058824 0.66666667 0.70588235 0.64705882]
mean value: 0.6892651358789129
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.023242 0.02464414 0.03425264 0.02344084 0.02590322 0.02454281
0.02392101 0.02409291 0.02381301 0.02434063]
mean value: 0.025219321250915527
key: score_time
value: [0.01257801 0.01315999 0.01226115 0.01292157 0.01545835 0.01711035
0.01511407 0.01517534 0.01511908 0.01794767]
mean value: 0.014684557914733887
key: test_mcc
value: [0.53935989 0.62994079 0.22364661 0.50443936 0.05046084 0.372678
0.43041423 0.29069387 0.29166667 0.67916667]
mean value: 0.4012466911245091
key: train_mcc
value: [0.98581488 0.99288247 0.91791995 0.98576494 0.95816272 1.
0.95078573 0.99290744 0.978869 0.92456546]
mean value: 0.9687672601676953
key: test_accuracy
value: [0.75 0.8125 0.61290323 0.74193548 0.51612903 0.61290323
0.70967742 0.64516129 0.64516129 0.83870968]
mean value: 0.688508064516129
key: train_accuracy
value: [0.99285714 0.99642857 0.95729537 0.99288256 0.97864769 1.
0.97508897 0.99644128 0.98932384 0.96085409]
mean value: 0.9839819522114895
key: test_fscore
value: [0.78947368 0.82352941 0.57142857 0.76470588 0.59459459 0.71428571
0.68965517 0.68571429 0.64516129 0.83870968]
mean value: 0.7117258284507069
key: train_fscore
value: [0.9929078 0.99641577 0.95918367 0.9929078 0.97916667 1.
0.9754386 0.99641577 0.98916968 0.96219931]
mean value: 0.984380506848783
key: test_precision
value: [0.68181818 0.77777778 0.61538462 0.68421053 0.5 0.55555556
0.76923077 0.63157895 0.66666667 0.86666667]
mean value: 0.6748889706784443
key: train_precision
value: [0.98591549 1. 0.92156863 0.9929078 0.95918367 1.
0.95862069 1. 1. 0.92715232]
mean value: 0.9745348602832521
key: test_recall
value: [0.9375 0.875 0.53333333 0.86666667 0.73333333 1.
0.625 0.75 0.625 0.8125 ]
mean value: 0.7758333333333334
key: train_recall
value: [1. 0.99285714 1. 0.9929078 1. 1.
0.99285714 0.99285714 0.97857143 1. ]
mean value: 0.9950050658561297
key: test_roc_auc
value: [0.75 0.8125 0.61041667 0.74583333 0.52291667 0.625
0.7125 0.64166667 0.64583333 0.83958333]
mean value: 0.690625
key: train_roc_auc
value: [0.99285714 0.99642857 0.95714286 0.99288247 0.97857143 1.
0.97515198 0.99642857 0.98928571 0.96099291]
mean value: 0.9839741641337386
key: test_jcc
value: [0.65217391 0.7 0.4 0.61904762 0.42307692 0.55555556
0.52631579 0.52173913 0.47619048 0.72222222]
mean value: 0.5596321629044742
key: train_jcc
value: [0.98591549 0.99285714 0.92156863 0.98591549 0.95918367 1.
0.95205479 0.99285714 0.97857143 0.92715232]
mean value: 0.9696076113522918
MCC on Blind test: 0.13
Accuracy on Blind test: 0.58
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02307701 0.03570557 0.0143652 0.01603484 0.01425147 0.01433063
0.03359962 0.03737807 0.0342834 0.0371778 ]
mean value: 0.026020359992980958
key: score_time
value: [0.02180982 0.01184416 0.0117712 0.01172042 0.01180339 0.01192832
0.02126312 0.02331209 0.0165627 0.02375984]
mean value: 0.016577506065368654
key: test_mcc
value: [0.62994079 0.72374686 0.4184137 0.61925228 0.16878989 0.6681531
0.37191715 0.48333333 0.6125 0.5612264 ]
mean value: 0.5257273524179626
key: train_mcc
value: [0.79303924 0.81503456 0.81494428 0.80802555 0.8363139 0.81630561
0.83730807 0.80101379 0.77982279 0.82207812]
mean value: 0.8123885891717825
key: test_accuracy
value: [0.8125 0.84375 0.70967742 0.80645161 0.58064516 0.80645161
0.67741935 0.74193548 0.80645161 0.77419355]
mean value: 0.7559475806451613
key: train_accuracy
value: [0.89642857 0.90714286 0.90747331 0.90391459 0.91814947 0.90747331
0.91814947 0.90035587 0.88967972 0.91103203]
mean value: 0.9059799186578545
key: test_fscore
value: [0.8 0.86486486 0.68965517 0.8125 0.60606061 0.83333333
0.64285714 0.75 0.8125 0.75862069]
mean value: 0.7570391809184912
key: train_fscore
value: [0.89530686 0.90510949 0.90780142 0.90322581 0.91872792 0.90510949
0.91575092 0.89855072 0.88727273 0.91039427]/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
mean value: 0.9047249610287941
key: test_precision
value: [0.85714286 0.76190476 0.71428571 0.76470588 0.55555556 0.71428571
0.75 0.75 0.8125 0.84615385]
mean value: 0.752653433168139
key: train_precision
value: [0.90510949 0.92537313 0.90780142 0.91304348 0.91549296 0.93233083
0.93984962 0.91176471 0.9037037 0.91366906]
mean value: 0.9168138403288595
key: test_recall
value: [0.75 1. 0.66666667 0.86666667 0.66666667 1.
0.5625 0.75 0.8125 0.6875 ]
mean value: 0.77625
key: train_recall
value: [0.88571429 0.88571429 0.90780142 0.89361702 0.92198582 0.87943262
0.89285714 0.88571429 0.87142857 0.90714286]
mean value: 0.8931408308004053
key: test_roc_auc
value: [0.8125 0.84375 0.70833333 0.80833333 0.58333333 0.8125
0.68125 0.74166667 0.80625 0.77708333]
mean value: 0.7575000000000001
key: train_roc_auc
value: [0.89642857 0.90714286 0.90747214 0.90395137 0.91813576 0.90757345
0.91805978 0.90030395 0.88961499 0.91101824]
mean value: 0.9059701114488349
key: test_jcc
value: [0.66666667 0.76190476 0.52631579 0.68421053 0.43478261 0.71428571
0.47368421 0.6 0.68421053 0.61111111]
mean value: 0.6157171915295485
key: train_jcc
value: [0.81045752 0.82666667 0.83116883 0.82352941 0.8496732 0.82666667
0.84459459 0.81578947 0.79738562 0.83552632]
mean value: 0.8261458300204431
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25024295 0.12240052 0.26541185 0.16076803 0.26370955 0.23450518
0.24887896 0.24265242 0.2682631 0.25118542]
mean value: 0.23080179691314698
key: score_time
value: [0.01692557 0.01229692 0.02219248 0.01252437 0.02168393 0.0223794
0.02114606 0.0221355 0.02409887 0.02430439]
mean value: 0.019968748092651367
key: test_mcc
value: [0.56360186 0.64549722 0.42321607 0.55 0.55 0.4770843
0.23012754 0.48527095 0.61925228 0.58316015]
mean value: 0.5127210368867323
key: train_mcc
value: [0.65041494 0.65014929 0.67315825 0.67267845 0.66585571 0.69395613
0.66565335 0.68713898 0.66594028 0.67989057]
mean value: 0.6704835938327732
key: test_accuracy
value: [0.78125 0.8125 0.70967742 0.77419355 0.77419355 0.70967742
0.61290323 0.74193548 0.80645161 0.77419355]
mean value: 0.7496975806451613
key: train_accuracy
value: [0.825 0.825 0.83629893 0.83629893 0.83274021 0.84697509
0.83274021 0.84341637 0.83274021 0.83985765]
mean value: 0.8351067615658363
key: test_fscore
value: [0.77419355 0.83333333 0.66666667 0.77419355 0.77419355 0.75675676
0.6 0.76470588 0.8 0.74074074]
mean value: 0.7484784025011729
key: train_fscore
value: [0.82807018 0.82685512 0.84027778 0.83571429 0.83623693 0.84805654
0.83392226 0.84507042 0.83508772 0.8409894 ]
mean value: 0.8370280636116797
key: test_precision
value: [0.8 0.75 0.75 0.75 0.75 0.63636364
0.64285714 0.72222222 0.85714286 0.90909091]
mean value: 0.7567676767676768
key: train_precision
value: [0.8137931 0.81818182 0.82312925 0.84172662 0.82191781 0.84507042
0.82517483 0.83333333 0.82068966 0.83216783]
mean value: 0.8275184668638604
key: test_recall
value: [0.75 0.9375 0.6 0.8 0.8 0.93333333
0.5625 0.8125 0.75 0.625 ]
mean value: 0.7570833333333333
key: train_recall
value: [0.84285714 0.83571429 0.85815603 0.82978723 0.85106383 0.85106383
0.84285714 0.85714286 0.85 0.85 ]
mean value: 0.8468642350557244
key: test_roc_auc
value: [0.78125 0.8125 0.70625 0.775 0.775 0.71666667
0.61458333 0.73958333 0.80833333 0.77916667]
mean value: 0.7508333333333334
key: train_roc_auc
value: [0.825 0.825 0.83622087 0.83632219 0.83267477 0.84696049
0.83277609 0.84346505 0.83280142 0.83989362]
mean value: 0.8351114488348531
key: test_jcc
value: [0.63157895 0.71428571 0.5 0.63157895 0.63157895 0.60869565
0.42857143 0.61904762 0.66666667 0.58823529]
mean value: 0.6020239216968252
key: train_jcc
value: [0.70658683 0.70481928 0.7245509 0.71779141 0.71856287 0.73619632
0.71515152 0.73170732 0.71686747 0.72560976]
mean value: 0.7197843664173944
MCC on Blind test: 0.39
Accuracy on Blind test: 0.7