LSHTM_analysis/scripts/ml/log_gid_cd_8020.txt
2022-06-20 21:55:47 +01:00

20887 lines
1 MiB

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_8020.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 531
PASS: my_features_df and aa_df successfully combined
nrows: 531
ncols: 286
count of NULL values before imputation
or_mychisq 263
log10_or_mychisq 263
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 167
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 174
-------------------------------------------------------------
Successfully split data with stratification [COMPLETE data]: 80/20
Original data size: (531, 174)
Train data size: (424, 174)
Test data size: (107, 174)
y_train numbers: Counter({0: 387, 1: 37})
y_train ratio: 10.45945945945946
y_test_numbers: Counter({0: 98, 1: 9})
y_test ratio: 10.88888888888889
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
Original Data
Counter({0: 387, 1: 37}) Data dim: (424, 174)
Simple Random OverSampling
Counter({0: 387, 1: 387})
(774, 174)
Simple Random UnderSampling
Counter({0: 37, 1: 37})
(74, 174)
Simple Combined Over and UnderSampling
Counter({0: 387, 1: 387})
(774, 174)
SMOTE_NC OverSampling
Counter({0: 387, 1: 387})
(774, 174)
#####################################################################
Running ML analysis [COMPLETE DATA]: 80/20 split
Gene name: gid
Drug name: streptomycin
Output directory: /home/tanu/git/Data/streptomycin/output/ml/tts_cd_8020/
Sanity checks:
Total input features: 174
Training data size: (424, 174)
Test data size: (107, 174)
Target feature numbers (training data): Counter({0: 387, 1: 37})
Target features ratio (training data: 10.45945945945946
Target feature numbers (test data): Counter({0: 98, 1: 9})
Target features ratio (test data): 10.88888888888889
#####################################################################
================================================================
Strucutral features (n): 35
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03192401 0.03113723 0.04093528 0.04060721 0.03610039 0.0360229
0.03432345 0.03375387 0.03668976 0.06538916]
mean value: 0.03868832588195801
key: score_time
value: [0.01240063 0.01255131 0.01229906 0.01265574 0.0123508 0.01547289
0.01250529 0.01247287 0.01213646 0.01538992]
mean value: 0.0130234956741333
key: test_mcc
value: [ 0.48181206 0. -0.04941662 0. -0.04331481 0.
0. 0. -0.07254763 0.48135986]
mean value: 0.07978928638667865
key: train_mcc
value: [0.37447578 0.37447578 0.41076425 0.37447578 0.36843806 0.43710276
0.36843806 0.41080663 0.41080663 0.37451574]
mean value: 0.3904299476091795
key: test_accuracy
value: [0.93023256 0.90697674 0.88372093 0.90697674 0.9047619 0.92857143
0.92857143 0.9047619 0.85714286 0.92857143]
mean value: 0.9080287929125138
key: train_accuracy
value: [0.92650919 0.92650919 0.92913386 0.92650919 0.92408377 0.92931937
0.92408377 0.92931937 0.92931937 0.92670157]
mean value: 0.9271488642453725
key: test_fscore
value: [0.4 0. 0. 0. 0. 0. 0. 0. 0. 0.4]
mean value: 0.08
key: train_fscore
value: [0.26315789 0.26315789 0.30769231 0.26315789 0.25641026 0.34146341
0.25641026 0.30769231 0.30769231 0.26315789]
mean value: 0.2829992429478951
key: test_precision
value: [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
mean value: 0.2
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0. 0.25]
mean value: 0.05
key: train_recall
value: [0.15151515 0.15151515 0.18181818 0.15151515 0.14705882 0.20588235
0.14705882 0.18181818 0.18181818 0.15151515]
mean value: 0.16515151515151516
key: test_roc_auc
value: [0.625 0.5 0.48717949 0.5 0.48717949 0.5
0.5 0.5 0.47368421 0.625 ]
mean value: 0.519804318488529
key: train_roc_auc
value: [0.57575758 0.57575758 0.59090909 0.57575758 0.57352941 0.60294118
0.57352941 0.59090909 0.59090909 0.57575758]
mean value: 0.5825757575757576
key: test_jcc
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0. 0.25]
mean value: 0.05
key: train_jcc
value: [0.15151515 0.15151515 0.18181818 0.15151515 0.14705882 0.20588235
0.14705882 0.18181818 0.18181818 0.15151515]
mean value: 0.16515151515151516
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.96618438 1.08755589 1.09190226 0.93135571 1.07575369 1.51552224
1.89240861 0.78960204 0.79159689 0.92069793]
mean value: 1.106257963180542
key: score_time
value: [0.01266503 0.01590943 0.01265359 0.01263499 0.01272488 0.01282477
0.01364899 0.0137465 0.0129962 0.01297641]
mean value: 0.01327807903289795
key: test_mcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_mcc
value: [0. 0.53314544 0.3344968 0. 0. 0.
0.23209905 0.166607 0. 0. ]
mean value: 0.12663482962026185
key: test_accuracy
value: [0.90697674 0.90697674 0.90697674 0.90697674 0.92857143 0.92857143
0.92857143 0.9047619 0.9047619 0.9047619 ]
mean value: 0.9127906976744186
key: train_accuracy
value: [0.91338583 0.93963255 0.92388451 0.91338583 0.91099476 0.91099476
0.91623037 0.91623037 0.91361257 0.91361257]
mean value: 0.9171964106580919
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_fscore
value: [0. 0.46511628 0.21621622 0. 0. 0.
0.11111111 0.05882353 0. 0. ]
mean value: 0.08512671358088596
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_precision
value: [0. 1. 1. 0. 0. 0. 1. 1. 0. 0.]
mean value: 0.4
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_recall
value: [0. 0.3030303 0.12121212 0. 0. 0.
0.05882353 0.03030303 0. 0. ]
mean value: 0.051336898395721926
key: test_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
mean value: 0.5
key: train_roc_auc
value: [0.5 0.65151515 0.56060606 0.5 0.5 0.5
0.52941176 0.51515152 0.5 0.5 ]
mean value: 0.5256684491978609
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_jcc
value: [0. 0.3030303 0.12121212 0. 0. 0.
0.05882353 0.03030303 0. 0. ]
mean value: 0.051336898395721926
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01445413 0.01128769 0.01004791 0.00985813 0.00981545 0.00973678
0.00977206 0.00995684 0.0101397 0.01077175]
mean value: 0.010584044456481933
key: score_time
value: [0.01259947 0.00952554 0.00934863 0.00907326 0.00903654 0.00910497
0.00904274 0.00979757 0.00924802 0.00942326]
mean value: 0.009619998931884765
key: test_mcc
value: [-0.1015749 0.10945345 -0.01322215 0.08089378 0.22870875 0.01378164
-0.06633727 -0.01546721 0.22941573 0.05735393]
mean value: 0.052300575466839826
key: train_mcc
value: [0.17611623 0.15152319 0.15274905 0.14822435 0.15051129 0.16019033
0.16120387 0.15921338 0.15038089 0.15626373]
mean value: 0.1566376314466152
key: test_accuracy
value: [0.34883721 0.46511628 0.27906977 0.41860465 0.47619048 0.38095238
0.52380952 0.47619048 0.42857143 0.38095238]
mean value: 0.4178294573643411
key: train_accuracy
value: [0.4671916 0.42257218 0.39370079 0.44619423 0.47120419 0.46073298
0.4895288 0.46596859 0.45026178 0.46073298]
mean value: 0.4528088112022646
key: test_fscore
value: [0.125 0.20689655 0.16216216 0.19354839 0.21428571 0.13333333
0.09090909 0.15384615 0.25 0.1875 ]
mean value: 0.17174813933573665
key: train_fscore
value: [0.22222222 0.20863309 0.20618557 0.20973783 0.21705426 0.21969697
0.22310757 0.21538462 0.21052632 0.21374046]
mean value: 0.21462889026464
key: test_precision
value: [0.07142857 0.12 0.09090909 0.11111111 0.12 0.07407407
0.05263158 0.09090909 0.14285714 0.10714286]
mean value: 0.09810635173793068
key: train_precision
value: [0.12719298 0.11836735 0.11627907 0.11965812 0.125 0.12608696
0.12903226 0.12334802 0.12017167 0.12227074]
mean value: 0.12274071672056991
key: test_recall
value: [0.5 0.75 0.75 0.75 1. 0.66666667
0.33333333 0.5 1. 0.75 ]
mean value: 0.7
key: train_recall
value: [0.87878788 0.87878788 0.90909091 0.84848485 0.82352941 0.85294118
0.82352941 0.84848485 0.84848485 0.84848485]
mean value: 0.8560606060606061
key: test_roc_auc
value: [0.41666667 0.59294872 0.49038462 0.56730769 0.71794872 0.51282051
0.43589744 0.48684211 0.68421053 0.54605263]
mean value: 0.5451079622132253
key: train_roc_auc
value: [0.6534744 0.62904911 0.62695925 0.62826541 0.63015551 0.63767748
0.64021298 0.63914214 0.63054615 0.63627681]
mean value: 0.6351759243457198
key: test_jcc
value: [0.06666667 0.11538462 0.08823529 0.10714286 0.12 0.07142857
0.04761905 0.08333333 0.14285714 0.10344828]
mean value: 0.09461158044119504
key: train_jcc
value: [0.125 0.11646586 0.11494253 0.11715481 0.12173913 0.12340426
0.12556054 0.12068966 0.11764706 0.11965812]
mean value: 0.12022619614295149
MCC on Blind test: 0.14
Accuracy on Blind test: 0.5
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01061749 0.00989461 0.01053047 0.00989676 0.01055145 0.00991964
0.00978327 0.00987339 0.00988674 0.00994134]
mean value: 0.010089516639709473
key: score_time
value: [0.00888753 0.0089438 0.00893641 0.00886178 0.00894022 0.00889421
0.0088439 0.00891805 0.00903153 0.00894308]
mean value: 0.008920049667358399
key: test_mcc
value: [-0.07073279 -0.07073279 -0.07073279 -0.1025641 -0.06201737 0.3721042
0.64102564 -0.07254763 -0.11926756 -0.08998425]
mean value: 0.035455057825129276
key: train_mcc
value: [0.15703158 0.13007539 0.11785082 0.15703158 0.12003681 0.09162152
0.02642975 0.10109391 0.13024398 0.16396815]
mean value: 0.11953834882652258
key: test_accuracy
value: [0.86046512 0.86046512 0.86046512 0.81395349 0.88095238 0.92857143
0.95238095 0.85714286 0.78571429 0.83333333]
mean value: 0.863344407530454
key: train_accuracy
value: [0.88451444 0.88451444 0.87926509 0.88451444 0.87958115 0.87958115
0.87958115 0.88481675 0.88481675 0.88743455]
mean value: 0.8828619917274738
key: test_fscore
value: [0. 0. 0. 0. 0. 0.4
0.66666667 0. 0. 0. ]
mean value: 0.10666666666666666
key: train_fscore
value: [0.21428571 0.18518519 0.17857143 0.21428571 0.17857143 0.14814815
0.08 0.15384615 0.18518519 0.21818182]
mean value: 0.17562607762607763
key: test_precision
value: [0. 0. 0. 0. 0. 0.5
0.66666667 0. 0. 0. ]
mean value: 0.11666666666666667
key: train_precision
value: [0.26086957 0.23809524 0.2173913 0.26086957 0.22727273 0.2
0.125 0.21052632 0.23809524 0.27272727]
mean value: 0.22508472267625584
key: test_recall
value: [0. 0. 0. 0. 0. 0.33333333
0.66666667 0. 0. 0. ]
mean value: 0.09999999999999999
key: train_recall
value: [0.18181818 0.15151515 0.15151515 0.18181818 0.14705882 0.11764706
0.05882353 0.12121212 0.15151515 0.18181818]
mean value: 0.1444741532976827
key: test_roc_auc
value: [0.47435897 0.47435897 0.47435897 0.44871795 0.47435897 0.65384615
0.82051282 0.47368421 0.43421053 0.46052632]
mean value: 0.5188933873144399
key: train_roc_auc
value: [0.5664838 0.55276907 0.54989551 0.5664838 0.54910412 0.53583502
0.50929682 0.53911609 0.55283494 0.56798645]
mean value: 0.5489805637877934
key: test_jcc
value: [0. 0. 0. 0. 0. 0.25 0.5 0. 0. 0. ]
mean value: 0.075
key: train_jcc
value: [0.12 0.10204082 0.09803922 0.12 0.09803922 0.08
0.04166667 0.08333333 0.10204082 0.12244898]
mean value: 0.0967609043617447
MCC on Blind test: 0.02
Accuracy on Blind test: 0.84
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00947571 0.01104259 0.01088667 0.01103258 0.01043534 0.01024675
0.01049209 0.01032686 0.0104568 0.01033139]
mean value: 0.010472679138183593
key: score_time
value: [0.06409955 0.01794052 0.01824045 0.01263189 0.01234031 0.01246119
0.01235628 0.01237869 0.01344562 0.01230836]
mean value: 0.01882028579711914
key: test_mcc
value: [ 0. 0. -0.04941662 -0.04941662 0. 0.
0. 0. 0. 0. ]
mean value: -0.009883324222148018
key: train_mcc
value: [0.26085891 0.10676511 0.23590006 0.2924785 0.18048814 0.33020253
0.23209905 0.26092303 0.18376878 0.18376878]
mean value: 0.22672528830993705
key: test_accuracy
value: [0.90697674 0.90697674 0.88372093 0.88372093 0.92857143 0.92857143
0.92857143 0.9047619 0.9047619 0.9047619 ]
mean value: 0.9081395348837209
key: train_accuracy
value: [0.91863517 0.91338583 0.91863517 0.92125984 0.91361257 0.92146597
0.91623037 0.91884817 0.91623037 0.91623037]
mean value: 0.9174533811545809
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_fscore
value: [0.20512821 0.05714286 0.11428571 0.21052632 0.10810811 0.25
0.11111111 0.20512821 0.11111111 0.11111111]
mean value: 0.14836527389158968
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_precision
value: [0.66666667 0.5 1. 0.8 0.66666667 0.83333333
1. 0.66666667 0.66666667 0.66666667]
mean value: 0.7466666666666667
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_recall
value: [0.12121212 0.03030303 0.06060606 0.12121212 0.05882353 0.14705882
0.05882353 0.12121212 0.06060606 0.06060606]
mean value: 0.08404634581105169
key: test_roc_auc
value: [0.5 0.5 0.48717949 0.48717949 0.5 0.5
0.5 0.5 0.5 0.5 ]
mean value: 0.49743589743589745
key: train_roc_auc
value: [0.5577325 0.51371473 0.53030303 0.55916928 0.52797498 0.57209263
0.52941176 0.55774073 0.52887037 0.52887037]
mean value: 0.5405880380374297
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_jcc
value: [0.11428571 0.02941176 0.06060606 0.11764706 0.05714286 0.14285714
0.05882353 0.11428571 0.05882353 0.05882353]
mean value: 0.0812706900942195
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01490879 0.01426005 0.01445842 0.01539779 0.01563573 0.01390314
0.01616335 0.01361179 0.01410937 0.01455712]
mean value: 0.014700555801391601
key: score_time
value: [0.010849 0.01102424 0.01084709 0.01244116 0.01025558 0.01261973
0.01078677 0.01285315 0.01057076 0.01026249]
mean value: 0.011250996589660644
key: test_mcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_mcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: test_accuracy
value: [0.90697674 0.90697674 0.90697674 0.90697674 0.92857143 0.92857143
0.92857143 0.9047619 0.9047619 0.9047619 ]
mean value: 0.9127906976744186
key: train_accuracy
value: [0.91338583 0.91338583 0.91338583 0.91338583 0.91099476 0.91099476
0.91099476 0.91361257 0.91361257 0.91361257]
mean value: 0.9127365296615411
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: test_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
mean value: 0.5
key: train_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
mean value: 0.5
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.61814642 1.97156453 1.49310732 1.56736088 1.46181703 1.51083326
1.64579344 1.56489897 2.24432826 2.29716277]
mean value: 1.737501287460327
key: score_time
value: [0.01693749 0.02715826 0.01663637 0.01687574 0.01798725 0.01930904
0.01699758 0.0131259 0.01702714 0.01333833]
mean value: 0.017539310455322265
key: test_mcc
value: [ 0.68964466 0.68964466 0.22657332 -0.04941662 0.18352066 0.
-0.04331481 -0.05066946 0.22496064 0.30832741]
mean value: 0.21792704562064058
key: train_mcc
value: [0.96645019 0.94937922 0.9320953 0.96645019 0.89976179 0.96736669
0.95077434 0.9664581 0.91460407 0.91460407]
mean value: 0.9427943971130062
key: test_accuracy
value: [0.95348837 0.95348837 0.88372093 0.88372093 0.85714286 0.92857143
0.9047619 0.88095238 0.88095238 0.9047619 ]
mean value: 0.903156146179402
key: train_accuracy
value: [0.99475066 0.99212598 0.98950131 0.99475066 0.98429319 0.9947644
0.9921466 0.9947644 0.98691099 0.98691099]
mean value: 0.9910919184840115
key: test_fscore
value: [0.66666667 0.66666667 0.28571429 0. 0.25 0.
0. 0. 0.28571429 0.33333333]
mean value: 0.2488095238095238
key: train_fscore
value: [0.96875 0.95238095 0.93548387 0.96875 0.90322581 0.96969697
0.95384615 0.96875 0.91803279 0.91803279]
mean value: 0.9456949327113923
key: test_precision
value: [1. 1. 0.33333333 0. 0.2 0.
0. 0. 0.33333333 0.5 ]
mean value: 0.33666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.5 0.25 0. 0.33333333 0.
0. 0. 0.25 0.25 ]
mean value: 0.20833333333333334
key: train_recall
value: [0.93939394 0.90909091 0.87878788 0.93939394 0.82352941 0.94117647
0.91176471 0.93939394 0.84848485 0.84848485]
mean value: 0.8979500891265597
key: test_roc_auc
value: [0.75 0.75 0.59935897 0.48717949 0.61538462 0.5
0.48717949 0.48684211 0.59868421 0.61184211]
mean value: 0.5886470985155196
key: train_roc_auc
value: [0.96969697 0.95454545 0.93939394 0.96969697 0.91176471 0.97058824
0.95588235 0.96969697 0.92424242 0.92424242]
mean value: 0.9489750445632799
key: test_jcc
value: [0.5 0.5 0.16666667 0. 0.14285714 0.
0. 0. 0.16666667 0.2 ]
mean value: 0.1676190476190476
key: train_jcc
value: [0.93939394 0.90909091 0.87878788 0.93939394 0.82352941 0.94117647
0.91176471 0.93939394 0.84848485 0.84848485]
mean value: 0.8979500891265597
MCC on Blind test: -0.03
Accuracy on Blind test: 0.91
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03453279 0.02307677 0.02107668 0.01948524 0.02900743 0.02436209
0.0176425 0.0181644 0.01899076 0.01896524]
mean value: 0.022530388832092286
key: score_time
value: [0.01600623 0.01071501 0.01038384 0.00954151 0.01352453 0.01127768
0.00921106 0.0090661 0.00886154 0.008816 ]
mean value: 0.010740351676940919
key: test_mcc
value: [ 0.17307692 0.54085245 0.33316004 0.22657332 0.3721042 -0.04331481
0.10090092 0.30832741 0.72368421 0.44736842]
mean value: 0.31827330806441606
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86046512 0.93023256 0.86046512 0.88372093 0.92857143 0.9047619
0.78571429 0.9047619 0.95238095 0.9047619 ]
mean value: 0.8915836101882614
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.25 0.57142857 0.4 0.28571429 0.4 0.
0.18181818 0.33333333 0.75 0.5 ]
mean value: 0.36722943722943724
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.25 0.66666667 0.33333333 0.33333333 0.5 0.
0.125 0.5 0.75 0.5 ]
mean value: 0.3958333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0.5 0.5 0.25 0.33333333 0.
0.33333333 0.25 0.75 0.5 ]
mean value: 0.36666666666666664
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.58653846 0.73717949 0.69871795 0.59935897 0.65384615 0.48717949
0.57692308 0.61184211 0.86184211 0.72368421]
mean value: 0.6537112010796221
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.14285714 0.4 0.25 0.16666667 0.25 0.
0.1 0.2 0.6 0.33333333]
mean value: 0.24428571428571427
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.1052103 0.1214056 0.13734531 0.12763286 0.12560916 0.12670326
0.12970734 0.1464169 0.1126895 0.12014818]
mean value: 0.12528684139251708
key: score_time
value: [0.02082181 0.01877928 0.01861334 0.01909399 0.01777148 0.02133393
0.02462673 0.02779722 0.02096295 0.02004457]
mean value: 0.020984530448913574
key: test_mcc
value: [-0.04941662 0. -0.04941662 0. -0.04331481 0.
-0.04331481 -0.05066946 0.53990552 0.30832741]
mean value: 0.06121006133156359
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88372093 0.90697674 0.88372093 0.90697674 0.9047619 0.92857143
0.9047619 0.88095238 0.92857143 0.9047619 ]
mean value: 0.9033776301218162
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0. 0. 0. 0. 0. 0.
0. 0. 0.57142857 0.33333333]
mean value: 0.09047619047619049
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0. 0. 0. 0. 0. 0.
0. 0. 0.66666667 0.5 ]
mean value: 0.11666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.5 0.25]
mean value: 0.075
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.48717949 0.5 0.48717949 0.5 0.48717949 0.5
0.48717949 0.48684211 0.73684211 0.61184211]
mean value: 0.5284244264507423
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.4 0.2]
mean value: 0.060000000000000005
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01258564 0.01007271 0.00997448 0.00981355 0.00982547 0.01669765
0.01397634 0.01023459 0.01069164 0.01094604]
mean value: 0.011481809616088866
key: score_time
value: [0.00925803 0.00899315 0.0088253 0.00868106 0.00893188 0.01443577
0.01076531 0.00975442 0.00923181 0.01473856]
mean value: 0.010361528396606446
key: test_mcc
value: [ 0.05263218 0.13359455 0.17307692 -0.1025641 0.28205128 -0.06201737
-0.10195592 0.22496064 -0.10526316 0.17105263]
mean value: 0.06655676503559743
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76744186 0.8372093 0.86046512 0.81395349 0.9047619 0.88095238
0.80952381 0.88095238 0.80952381 0.85714286]
mean value: 0.8421926910299004
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.16666667 0.22222222 0.25 0. 0.33333333 0.
0. 0.28571429 0. 0.25 ]
mean value: 0.15079365079365079
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.125 0.2 0.25 0. 0.33333333 0.
0. 0.33333333 0. 0.25 ]
mean value: 0.14916666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0.25 0.25 0. 0.33333333 0.
0. 0.25 0. 0.25 ]
mean value: 0.15833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.53525641 0.57371795 0.58653846 0.44871795 0.64102564 0.47435897
0.43589744 0.59868421 0.44736842 0.58552632]
mean value: 0.5327091767881241
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.09090909 0.125 0.14285714 0. 0.2 0.
0. 0.16666667 0. 0.14285714]
mean value: 0.08682900432900433
key: train_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.0
Accuracy on Blind test: 0.82
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.55919266 1.72587776 1.53629327 1.52326417 1.58059716 1.51584148
1.66157722 1.55928731 1.65326643 1.54791665]
mean value: 1.586311411857605
key: score_time
value: [0.09133768 0.0917778 0.09419084 0.09663606 0.09888506 0.09498286
0.09827924 0.09764957 0.10028219 0.09653711]
mean value: 0.09605584144592286
key: test_mcc
value: [-0.04941662 0. 0. 0. -0.06201737 0.
0. 0. 0.53990552 0. ]
mean value: 0.04284715363936728
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88372093 0.90697674 0.90697674 0.90697674 0.88095238 0.92857143
0.92857143 0.9047619 0.92857143 0.9047619 ]
mean value: 0.9080841638981174
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0. 0. 0. 0. 0. 0.
0. 0. 0.57142857 0. ]
mean value: 0.05714285714285715
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0. 0. 0. 0. 0. 0.
0. 0. 0.66666667 0. ]
mean value: 0.06666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.5 0. ]
mean value: 0.05
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.48717949 0.5 0.5 0.5 0.47435897 0.5
0.5 0.5 0.73684211 0.5 ]
mean value: 0.5198380566801619
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.4 0. ]
mean value: 0.04
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
key: fit_time
value: [1.8430841 1.00914884 1.01401234 1.09800839 0.98056817 0.90954971
0.9472239 0.9193871 0.9473846 0.93818474]
mean value: 1.0606551885604858
key: score_time
value: [0.14443254 0.15618467 0.14113212 0.20618296 0.17023635 0.23195982
0.24180889 0.11920571 0.19670653 0.20713711]
mean value: 0.18149867057800292
key: test_mcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_mcc
value: [0.23590006 0. 0.23590006 0.23590006 0.23209905 0.23209905
0.16390329 0. 0. 0.166607 ]
mean value: 0.1502408565624055
key: test_accuracy
value: [0.90697674 0.90697674 0.90697674 0.90697674 0.92857143 0.92857143
0.92857143 0.9047619 0.9047619 0.9047619 ]
mean value: 0.9127906976744186
key: train_accuracy
value: [0.91863517 0.91338583 0.91863517 0.91863517 0.91623037 0.91623037
0.91361257 0.91361257 0.91361257 0.91623037]
mean value: 0.9158820134394196
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_fscore
value: [0.11428571 0. 0.11428571 0.11428571 0.11111111 0.11111111
0.05714286 0. 0. 0.05882353]
mean value: 0.0681045751633987
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_precision
value: [1. 0. 1. 1. 1. 1. 1. 0. 0. 1.]
mean value: 0.7
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_recall
value: [0.06060606 0. 0.06060606 0.06060606 0.05882353 0.05882353
0.02941176 0. 0. 0.03030303]
mean value: 0.03591800356506239
key: test_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
mean value: 0.5
key: train_roc_auc
value: [0.53030303 0.5 0.53030303 0.53030303 0.52941176 0.52941176
0.51470588 0.5 0.5 0.51515152]
mean value: 0.5179590017825312
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_jcc
value: [0.06060606 0. 0.06060606 0.06060606 0.05882353 0.05882353
0.02941176 0. 0. 0.03030303]
mean value: 0.03591800356506239
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02565026 0.01137781 0.01134181 0.01137114 0.01124692 0.01111341
0.01204944 0.01166487 0.01144695 0.01121879]
mean value: 0.012848138809204102
key: score_time
value: [0.01050162 0.00984645 0.01008201 0.009938 0.00940609 0.00991988
0.01095891 0.01004028 0.00953197 0.00999093]
mean value: 0.010021615028381347
key: test_mcc
value: [-0.07073279 -0.07073279 -0.07073279 -0.1025641 -0.06201737 0.3721042
0.64102564 -0.07254763 -0.11926756 -0.08998425]
mean value: 0.035455057825129276
key: train_mcc
value: [0.15703158 0.13007539 0.11785082 0.15703158 0.12003681 0.09162152
0.02642975 0.10109391 0.13024398 0.16396815]
mean value: 0.11953834882652258
key: test_accuracy
value: [0.86046512 0.86046512 0.86046512 0.81395349 0.88095238 0.92857143
0.95238095 0.85714286 0.78571429 0.83333333]
mean value: 0.863344407530454
key: train_accuracy
value: [0.88451444 0.88451444 0.87926509 0.88451444 0.87958115 0.87958115
0.87958115 0.88481675 0.88481675 0.88743455]
mean value: 0.8828619917274738
key: test_fscore
value: [0. 0. 0. 0. 0. 0.4
0.66666667 0. 0. 0. ]
mean value: 0.10666666666666666
key: train_fscore
value: [0.21428571 0.18518519 0.17857143 0.21428571 0.17857143 0.14814815
0.08 0.15384615 0.18518519 0.21818182]
mean value: 0.17562607762607763
key: test_precision
value: [0. 0. 0. 0. 0. 0.5
0.66666667 0. 0. 0. ]
mean value: 0.11666666666666667
key: train_precision
value: [0.26086957 0.23809524 0.2173913 0.26086957 0.22727273 0.2
0.125 0.21052632 0.23809524 0.27272727]
mean value: 0.22508472267625584
key: test_recall
value: [0. 0. 0. 0. 0. 0.33333333
0.66666667 0. 0. 0. ]
mean value: 0.09999999999999999
key: train_recall
value: [0.18181818 0.15151515 0.15151515 0.18181818 0.14705882 0.11764706
0.05882353 0.12121212 0.15151515 0.18181818]
mean value: 0.1444741532976827
key: test_roc_auc
value: [0.47435897 0.47435897 0.47435897 0.44871795 0.47435897 0.65384615
0.82051282 0.47368421 0.43421053 0.46052632]
mean value: 0.5188933873144399
key: train_roc_auc
value: [0.5664838 0.55276907 0.54989551 0.5664838 0.54910412 0.53583502
0.50929682 0.53911609 0.55283494 0.56798645]
mean value: 0.5489805637877934
key: test_jcc
value: [0. 0. 0. 0. 0. 0.25 0.5 0. 0. 0. ]
mean value: 0.075
key: train_jcc
value: [0.12 0.10204082 0.09803922 0.12 0.09803922 0.08
0.04166667 0.08333333 0.10204082 0.12244898]
mean value: 0.0967609043617447
MCC on Blind test: 0.02
Accuracy on Blind test: 0.84
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.12375498 0.07309937 0.06955266 0.06853867 0.06676149 0.067137
0.07728696 0.07254744 0.2646172 0.06412482]
mean value: 0.0947420597076416
key: score_time
value: [0.01223707 0.01213765 0.01118922 0.01085472 0.01080298 0.01105309
0.01106906 0.01112413 0.01132703 0.01226878]
mean value: 0.011406373977661134
key: test_mcc
value: [-0.07073279 0.68964466 0.30945594 -0.07073279 0.28205128 0.56309251
-0.08998425 0.44736842 0.56292625 0.30832741]
mean value: 0.2931416638415426
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86046512 0.95348837 0.90697674 0.86046512 0.9047619 0.95238095
0.83333333 0.9047619 0.9047619 0.9047619 ]
mean value: 0.8986157253599114
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0. 0.66666667 0.33333333 0. 0.33333333 0.5
0. 0.5 0.6 0.33333333]
mean value: 0.32666666666666666
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0. 1. 0.5 0. 0.33333333 1.
0. 0.5 0.5 0.5 ]
mean value: 0.43333333333333335
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0.5 0.25 0. 0.33333333 0.33333333
0. 0.5 0.75 0.25 ]
mean value: 0.2916666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47435897 0.75 0.61217949 0.47435897 0.64102564 0.66666667
0.44871795 0.72368421 0.83552632 0.61184211]
mean value: 0.6238360323886639
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0. 0.5 0.2 0. 0.2 0.33333333
0. 0.33333333 0.42857143 0.2 ]
mean value: 0.2195238095238095
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.94
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04794931 0.09288263 0.06239581 0.03627229 0.07611275 0.08134341
0.07803798 0.03760433 0.03793168 0.05047679]
mean value: 0.06010069847106934
key: score_time
value: [0.02165794 0.02084374 0.01258087 0.01275325 0.02337122 0.02328706
0.01270866 0.01264191 0.01441884 0.01300359]
mean value: 0.01672670841217041
key: test_mcc
value: [ 0.22657332 0.48181206 0.30945594 -0.07073279 0.18352066 0.
-0.07692308 0.22496064 0.225913 0.33113309]
mean value: 0.18357128371597806
key: train_mcc
value: [0.65926093 0.64227412 0.61325056 0.74149791 0.67714672 0.66871308
0.67471413 0.72759047 0.65686546 0.7000858 ]
mean value: 0.6761399178233465
key: test_accuracy
value: [0.88372093 0.93023256 0.90697674 0.86046512 0.85714286 0.92857143
0.85714286 0.88095238 0.78571429 0.85714286]
mean value: 0.8748062015503876
key: train_accuracy
value: [0.94750656 0.94750656 0.93963255 0.96062992 0.94764398 0.94764398
0.94502618 0.95811518 0.95026178 0.95549738]
mean value: 0.9499464072226573
key: test_fscore
value: [0.28571429 0.4 0.33333333 0. 0.25 0.
0. 0.28571429 0.30769231 0.4 ]
mean value: 0.22624542124542124
key: train_fscore
value: [0.6875 0.66666667 0.64615385 0.76190476 0.70588235 0.6969697
0.70422535 0.75 0.6779661 0.72131148]
mean value: 0.7018580253853576
key: test_precision
value: [0.33333333 1. 0.5 0. 0.2 0.
0. 0.33333333 0.22222222 0.33333333]
mean value: 0.2922222222222222
key: train_precision
value: [0.70967742 0.74074074 0.65625 0.8 0.70588235 0.71875
0.67567568 0.77419355 0.76923077 0.78571429]
mean value: 0.7336114792044583
key: test_recall
value: [0.25 0.25 0.25 0. 0.33333333 0.
0. 0.25 0.5 0.5 ]
mean value: 0.23333333333333334
key: train_recall
value: [0.66666667 0.60606061 0.63636364 0.72727273 0.70588235 0.67647059
0.73529412 0.72727273 0.60606061 0.66666667]
mean value: 0.6754010695187166
key: test_roc_auc
value: [0.59935897 0.625 0.61217949 0.47435897 0.61538462 0.5
0.46153846 0.59868421 0.65789474 0.69736842]
mean value: 0.5841767881241565
key: train_roc_auc
value: [0.8204023 0.79297283 0.80237722 0.85501567 0.83857336 0.82530426
0.85040568 0.85360771 0.79443431 0.82473734]
mean value: 0.8257830694232584
key: test_jcc
value: [0.16666667 0.25 0.2 0. 0.14285714 0.
0. 0.16666667 0.18181818 0.25 ]
mean value: 0.1358008658008658
key: train_jcc
value: [0.52380952 0.5 0.47727273 0.61538462 0.54545455 0.53488372
0.54347826 0.6 0.51282051 0.56410256]
mean value: 0.5417206470644287
MCC on Blind test: -0.04
Accuracy on Blind test: 0.9
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01510549 0.01138926 0.01075006 0.00951076 0.00951099 0.0097208
0.00946641 0.00978231 0.00945735 0.00954652]
mean value: 0.010423994064331055
key: score_time
value: [0.0113039 0.01032138 0.01299763 0.00895143 0.00893474 0.00877857
0.00886536 0.00882006 0.00871015 0.00865507]
mean value: 0.009633827209472656
key: test_mcc
value: [ 0.30945594 0. 0. 0.48181206 0. -0.04331481
0. -0.07254763 -0.05066946 0.48135986]
mean value: 0.11060959626479741
key: train_mcc
value: [0.17757685 0.21048121 0.21048121 0.15138864 0.22104405 0.23664596
0.25626504 0.21172138 0.29974907 0.19797607]
mean value: 0.21733294901551917
key: test_accuracy
value: [0.90697674 0.90697674 0.90697674 0.93023256 0.92857143 0.9047619
0.92857143 0.85714286 0.88095238 0.92857143]
mean value: 0.9079734219269103
key: train_accuracy
value: [0.90026247 0.9160105 0.9160105 0.91338583 0.90837696 0.91099476
0.91623037 0.90837696 0.91884817 0.91099476]
mean value: 0.9119491280867379
key: test_fscore
value: [0.33333333 0. 0. 0.4 0. 0.
0. 0. 0. 0.4 ]
mean value: 0.11333333333333334
key: train_fscore
value: [0.20833333 0.15789474 0.15789474 0.10810811 0.22222222 0.22727273
0.2 0.22222222 0.27906977 0.19047619]
mean value: 0.19734940447608745
key: test_precision
value: [0.5 0. 0. 1. 0. 0. 0. 0. 0. 1. ]
mean value: 0.25
key: train_precision
value: [0.33333333 0.6 0.6 0.5 0.45454545 0.5
0.66666667 0.41666667 0.6 0.44444444]
mean value: 0.5115656565656566
key: test_recall
value: [0.25 0. 0. 0.25 0. 0. 0. 0. 0. 0.25]
mean value: 0.075
key: train_recall
value: [0.15151515 0.09090909 0.09090909 0.06060606 0.14705882 0.14705882
0.11764706 0.15151515 0.18181818 0.12121212]
mean value: 0.12602495543672015
key: test_roc_auc
value: [0.61217949 0.5 0.5 0.625 0.5 0.48717949
0.5 0.47368421 0.48684211 0.625 ]
mean value: 0.5309885290148448
key: train_roc_auc
value: [0.56138976 0.54258098 0.54258098 0.52742947 0.56490872 0.5663455
0.55594997 0.56572892 0.58517843 0.55344274]
mean value: 0.5565535474413782
key: test_jcc
value: [0.2 0. 0. 0.25 0. 0. 0. 0. 0. 0.25]
mean value: 0.07
key: train_jcc
value: [0.11627907 0.08571429 0.08571429 0.05714286 0.125 0.12820513
0.11111111 0.125 0.16216216 0.10526316]
mean value: 0.11015920577120088
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01084042 0.01553941 0.01838779 0.01581478 0.0162158 0.01483154
0.01682901 0.01865387 0.01745009 0.03337455]
mean value: 0.017793726921081544
key: score_time
value: [0.0087297 0.01125956 0.01132178 0.01195478 0.01180315 0.01202106
0.01192927 0.01196718 0.01190042 0.02841592]
mean value: 0.013130283355712891
key: test_mcc
value: [ 0.48181206 0. -0.04941662 0. -0.04331481 0.
-0.04331481 0. 0.225913 0. ]
mean value: 0.05716788162440957
key: train_mcc
value: [0.48545474 0.37447578 0.63337838 0.59698615 0.49591003 0.47074797
0.55088323 0.166607 0.44537558 0.2359277 ]
mean value: 0.445574656840995
key: test_accuracy
value: [0.93023256 0.90697674 0.88372093 0.90697674 0.9047619 0.92857143
0.9047619 0.9047619 0.78571429 0.9047619 ]
mean value: 0.896124031007752
key: train_accuracy
value: [0.9343832 0.92650919 0.95013123 0.94488189 0.93455497 0.93193717
0.93979058 0.91623037 0.79057592 0.91884817]
mean value: 0.9187842684585893
key: test_fscore
value: [0.4 0. 0. 0. 0. 0.
0. 0. 0.30769231 0. ]
mean value: 0.07076923076923076
key: train_fscore
value: [0.46808511 0.26315789 0.6122449 0.60377358 0.44444444 0.43478261
0.48888889 0.05882353 0.43661972 0.11428571]
mean value: 0.39251063880209885
key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[1. 0. 0. 0. 0. 0.
0. 0. 0.22222222 0. ]
mean value: 0.12222222222222222
key: train_precision
value: [0.78571429 1. 0.9375 0.8 0.90909091 0.83333333
1. 1. 0.28440367 1. ]
mean value: 0.8550042197863299
key: test_recall
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0.5 0. ]
mean value: 0.075
key: train_recall
value: [0.33333333 0.15151515 0.45454545 0.48484848 0.29411765 0.29411765
0.32352941 0.03030303 0.93939394 0.06060606]
mean value: 0.3366310160427808
key: test_roc_auc
value: [0.625 0.5 0.48717949 0.5 0.48717949 0.5
0.48717949 0.5 0.65789474 0.5 ]
mean value: 0.5244433198380567
key: train_roc_auc
value: [0.66235632 0.57575758 0.72583595 0.73667712 0.64562204 0.64418526
0.66176471 0.51515152 0.85794912 0.53030303]
mean value: 0.6555602631509889
key: test_jcc
value: [0.25 0. 0. 0. 0. 0.
0. 0. 0.18181818 0. ]
mean value: 0.04318181818181818
key: train_jcc
value: [0.30555556 0.15151515 0.44117647 0.43243243 0.28571429 0.27777778
0.32352941 0.03030303 0.27927928 0.06060606]
mean value: 0.25878894555365145
MCC on Blind test: -0.05
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01827741 0.01960063 0.01943398 0.01684809 0.01676345 0.01916051
0.01696277 0.0201962 0.01844573 0.01990056]
mean value: 0.01855893135070801
key: score_time
value: [0.00966454 0.01185894 0.01233459 0.01199102 0.01228094 0.01182985
0.01288772 0.01254129 0.01219296 0.01226282]
mean value: 0.011984467506408691
key: test_mcc
value: [ 0.54085245 0.68964466 0. 0.17307692 -0.06201737 -0.04331481
0. 0.13119432 -0.07254763 0.42358687]
mean value: 0.17804754150847987
key: train_mcc
value: [0.69379826 0.59230718 0.33581158 0.6312403 0.54978914 0.70872731
0.16390329 0.70148476 0.74347157 0.61900242]
mean value: 0.5739535820824407
key: test_accuracy
value: [0.93023256 0.95348837 0.90697674 0.86046512 0.88095238 0.9047619
0.92857143 0.83333333 0.85714286 0.83333333]
mean value: 0.8889258028792912
key: train_accuracy
value: [0.95538058 0.93175853 0.92388451 0.92125984 0.93979058 0.95549738
0.91361257 0.95287958 0.96335079 0.92146597]
mean value: 0.9378880323205673
key: test_fscore
value: [0.57142857 0.66666667 0. 0.25 0. 0.
0. 0.22222222 0. 0.46153846]
mean value: 0.2171855921855922
key: train_fscore
value: [0.71186441 0.62857143 0.25641026 0.65116279 0.5106383 0.73015873
0.05714286 0.72727273 0.74074074 0.64285714]
mean value: 0.5656819378503559
key: test_precision
value: [0.66666667 1. 0. 0.25 0. 0.
0. 0.2 0. 0.33333333]
mean value: 0.245
key: train_precision
value: [0.80769231 0.59459459 0.83333333 0.52830189 0.92307692 0.79310345
1. 0.72727273 0.95238095 0.52941176]
mean value: 0.7689167938125036
key: test_recall
value: [0.5 0.5 0. 0.25 0. 0. 0. 0.25 0. 0.75]
mean value: 0.225
key: train_recall
value: [0.63636364 0.66666667 0.15151515 0.84848485 0.35294118 0.67647059
0.02941176 0.72727273 0.60606061 0.81818182]
mean value: 0.551336898395722
key: test_roc_auc
value: [0.73717949 0.75 0.5 0.58653846 0.47435897 0.48717949
0.5 0.57236842 0.47368421 0.79605263]
mean value: 0.5877361673414305
key: train_roc_auc
value: [0.81099791 0.81178161 0.57432079 0.88832288 0.67503381 0.8296146
0.51470588 0.85074238 0.80159764 0.87470695]
mean value: 0.7631824464972056
key: test_jcc
value: [0.4 0.5 0. 0.14285714 0. 0.
0. 0.125 0. 0.3 ]
mean value: 0.1467857142857143
key: train_jcc
value: [0.55263158 0.45833333 0.14705882 0.48275862 0.34285714 0.575
0.02941176 0.57142857 0.58823529 0.47368421]
mean value: 0.42213993401353284
MCC on Blind test: 0.12
Accuracy on Blind test: 0.9
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.16836762 0.14756989 0.15327668 0.1564548 0.16191888 0.15412354
0.15349913 0.16245484 0.15289831 0.15437984]
mean value: 0.1564943552017212
key: score_time
value: [0.01530409 0.01531816 0.01582527 0.01699758 0.01619935 0.01636124
0.01568985 0.01542091 0.01689005 0.01532292]
mean value: 0.01593294143676758
key: test_mcc
value: [ 0.22657332 0.48181206 0.22657332 0.22657332 0.28205128 -0.06201737
-0.06201737 0.53990552 0.38165619 0.53990552]
mean value: 0.2781015812515728
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88372093 0.93023256 0.88372093 0.88372093 0.9047619 0.88095238
0.88095238 0.92857143 0.88095238 0.92857143]
mean value: 0.8986157253599114
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.28571429 0.4 0.28571429 0.28571429 0.33333333 0.
0. 0.57142857 0.44444444 0.57142857]
mean value: 0.3177777777777778
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.33333333 1. 0.33333333 0.33333333 0.33333333 0.
0. 0.66666667 0.4 0.66666667]
mean value: 0.4066666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0.25 0.25 0.25 0.33333333 0.
0. 0.5 0.5 0.5 ]
mean value: 0.2833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.59935897 0.625 0.59935897 0.59935897 0.64102564 0.47435897
0.47435897 0.73684211 0.71052632 0.73684211]
mean value: 0.6197031039136303
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.16666667 0.25 0.16666667 0.16666667 0.2 0.
0. 0.4 0.28571429 0.4 ]
mean value: 0.20357142857142857
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.56
Accuracy on Blind test: 0.94
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05110979 0.06111956 0.06785369 0.05261636 0.05201411 0.05518365
0.06564116 0.05653381 0.04967999 0.06266403]
mean value: 0.05744161605834961
key: score_time
value: [0.01756144 0.02771926 0.02968049 0.0247016 0.01941895 0.03197765
0.02192426 0.02888417 0.02351975 0.02344275]
mean value: 0.024883031845092773
key: test_mcc
value: [ 0.22657332 0.68964466 0.22657332 -0.07073279 0.18352066 0.56309251
0.15097027 0.30832741 0.44736842 0.22496064]
mean value: 0.29502984201246096
key: train_mcc
value: [0.9320953 0.94937922 0.94937922 1. 0.93418296 0.93418296
0.96771467 0.9664581 0.94939084 0.93211048]
mean value: 0.951489375452085
key: test_accuracy
value: [0.88372093 0.95348837 0.88372093 0.86046512 0.85714286 0.95238095
0.83333333 0.9047619 0.9047619 0.88095238]
mean value: 0.8914728682170543
key: train_accuracy
value: [0.98950131 0.99212598 0.99212598 1. 0.9895288 0.9895288
0.9947644 0.9947644 0.9921466 0.9895288 ]
mean value: 0.9924015060944607
key: test_fscore
value: [0.28571429 0.66666667 0.28571429 0. 0.25 0.5
0.22222222 0.33333333 0.5 0.28571429]
mean value: 0.3329365079365079
key: train_fscore
value: [0.93548387 0.95238095 0.95238095 1. 0.93939394 0.93939394
0.97058824 0.96875 0.95238095 0.93548387]
mean value: 0.9546236713160338
key: test_precision
value: [0.33333333 1. 0.33333333 0. 0.2 1.
0.16666667 0.5 0.5 0.33333333]
mean value: 0.43666666666666665
key: train_precision
value: [1. 1. 1. 1. 0.96875 0.96875
0.97058824 1. 1. 1. ]
mean value: 0.9908088235294118
key: test_recall
value: [0.25 0.5 0.25 0. 0.33333333 0.33333333
0.33333333 0.25 0.5 0.25 ]
mean value: 0.3
key: train_recall
value: [0.87878788 0.90909091 0.90909091 1. 0.91176471 0.91176471
0.97058824 0.93939394 0.90909091 0.87878788]
mean value: 0.9218360071301248
key: test_roc_auc
value: [0.59935897 0.75 0.59935897 0.47435897 0.61538462 0.66666667
0.6025641 0.61184211 0.72368421 0.59868421]
mean value: 0.6241902834008097
key: train_roc_auc
value: [0.93939394 0.95454545 0.95454545 1. 0.95444557 0.95444557
0.98385734 0.96969697 0.95454545 0.93939394]
mean value: 0.9604869690823038
key: test_jcc
value: [0.16666667 0.5 0.16666667 0. 0.14285714 0.33333333
0.125 0.2 0.33333333 0.16666667]
mean value: 0.21345238095238095
key: train_jcc
value: [0.87878788 0.90909091 0.90909091 1. 0.88571429 0.88571429
0.94285714 0.93939394 0.90909091 0.87878788]
mean value: 0.9138528138528138
MCC on Blind test: 0.57
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.13151288 0.1273613 0.15711355 0.12300849 0.15051484 0.16000438
0.1297493 0.0987041 0.09513092 0.1125598 ]
mean value: 0.12856595516204833
key: score_time
value: [0.02892828 0.0251348 0.03072 0.02347708 0.03113389 0.02723908
0.03679729 0.01404428 0.02238226 0.02640605]
mean value: 0.02662630081176758
key: test_mcc
value: [-0.07073279 0. 0. -0.04941662 -0.04331481 0.
0. -0.07254763 0.48135986 0. ]
mean value: 0.024534802246809664
key: train_mcc
value: [0.80389026 0.76435313 0.76435313 0.80389026 0.75198794 0.75198794
0.75198794 0.78435341 0.76439257 0.74401079]
mean value: 0.7685207375108924
key: test_accuracy
value: [0.86046512 0.90697674 0.90697674 0.88372093 0.9047619 0.92857143
0.92857143 0.85714286 0.92857143 0.9047619 ]
mean value: 0.9010520487264673
key: train_accuracy
value: [0.97112861 0.96587927 0.96587927 0.97112861 0.96335079 0.96335079
0.96335079 0.96858639 0.96596859 0.96335079]
mean value: 0.9661973863214742
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.4 0. ]
mean value: 0.04
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[0.8 0.75471698 0.75471698 0.8 0.74074074 0.74074074
0.74074074 0.77777778 0.75471698 0.73076923]
mean value: 0.7594920174165457
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
mean value: 0.1
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.25 0. ]
mean value: 0.025
key: train_recall
value: [0.66666667 0.60606061 0.60606061 0.66666667 0.58823529 0.58823529
0.58823529 0.63636364 0.60606061 0.57575758]
mean value: 0.6128342245989304
key: test_roc_auc
value: [0.47435897 0.5 0.5 0.48717949 0.48717949 0.5
0.5 0.47368421 0.625 0.5 ]
mean value: 0.5047402159244264
key: train_roc_auc
value: [0.83333333 0.8030303 0.8030303 0.83333333 0.79411765 0.79411765
0.79411765 0.81818182 0.8030303 0.78787879]
mean value: 0.8064171122994652
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0.25 0. ]
mean value: 0.025
key: train_jcc
value: [0.66666667 0.60606061 0.60606061 0.66666667 0.58823529 0.58823529
0.58823529 0.63636364 0.60606061 0.57575758]
mean value: 0.6128342245989304
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.57992864 0.56298947 0.56018972 0.5629847 0.56662011 0.56279898
0.56053519 0.55971837 0.56311035 0.56742144]
mean value: 0.5646296977996826
key: score_time
value: [0.00969124 0.00931811 0.00929976 0.00983834 0.00941968 0.00948024
0.00948286 0.00946546 0.01033235 0.00928783]
mean value: 0.009561586380004882
key: test_mcc
value: [-0.07073279 0.68964466 0.17307692 0. 0.22496064 0.
-0.08998425 0.63211807 0.68920244 0.30832741]
mean value: 0.2556613094316989
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86046512 0.95348837 0.86046512 0.90697674 0.88095238 0.92857143
0.83333333 0.92857143 0.95238095 0.9047619 ]
mean value: 0.9009966777408638
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0. 0.66666667 0.25 0. 0.28571429 0.
0. 0.66666667 0.66666667 0.33333333]
mean value: 0.2869047619047619
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0. 1. 0.25 0. 0.25 0. 0. 0.6 1. 0.5 ]
mean value: 0.36
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0.5 0.25 0. 0.33333333 0.
0. 0.75 0.5 0.25 ]
mean value: 0.2583333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47435897 0.75 0.58653846 0.5 0.62820513 0.5
0.44871795 0.84868421 0.75 0.61184211]
mean value: 0.6098346828609986
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0. 0.5 0.14285714 0. 0.16666667 0.
0. 0.5 0.5 0.2 ]
mean value: 0.20095238095238094
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.94
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02562785 0.0249536 0.02552819 0.0282321 0.0321064 0.02481318
0.0250473 0.0426383 0.03565145 0.04259801]
mean value: 0.030719637870788574
key: score_time
value: [0.01248002 0.01663256 0.01270652 0.01256371 0.02706981 0.01266313
0.01275277 0.01269436 0.01257467 0.01300025]
mean value: 0.014513778686523437
key: test_mcc
value: [ 0.13359455 -0.0877058 0. -0.04941662 -0.04331481 -0.04331481
-0.06201737 -0.07254763 0.22496064 -0.08998425]
mean value: -0.008974610206864929
key: train_mcc
value: [0.23590006 0. 0. 0. 0.16390329 0.
0. 0. 0.166607 0.2359277 ]
mean value: 0.08023380575009306
key: test_accuracy
value: [0.8372093 0.8372093 0.90697674 0.88372093 0.9047619 0.9047619
0.88095238 0.85714286 0.88095238 0.83333333]
mean value: 0.872702104097453
key: train_accuracy
value: [0.91863517 0.91338583 0.91338583 0.91338583 0.91361257 0.91099476
0.91099476 0.91361257 0.91623037 0.91884817]
mean value: 0.9143085844635913
key: test_fscore
value: [0.22222222 0. 0. 0. 0. 0.
0. 0. 0.28571429 0. ]
mean value: 0.0507936507936508
key: train_fscore
value: [0.11428571 0. 0. 0. 0.05714286 0.
0. 0. 0.05882353 0.11428571]
mean value: 0.034453781512605045
key: test_precision
value: [0.2 0. 0. 0. 0. 0.
0. 0. 0.33333333 0. ]
mean value: 0.05333333333333333
key: train_precision
value: [1. 0. 0. 0. 1. 0. 0. 0. 1. 1.]
mean value: 0.4
key: test_recall
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0.25 0. ]
mean value: 0.05
key: train_recall
value: [0.06060606 0. 0. 0. 0.02941176 0.
0. 0. 0.03030303 0.06060606]
mean value: 0.018092691622103387
key: test_roc_auc
value: [0.57371795 0.46153846 0.5 0.48717949 0.48717949 0.48717949
0.47435897 0.47368421 0.59868421 0.46052632]
mean value: 0.5004048582995951
key: train_roc_auc
value: [0.53030303 0.5 0.5 0.5 0.51470588 0.5
0.5 0.5 0.51515152 0.53030303]
mean value: 0.5090463458110517
key: test_jcc
value: [0.125 0. 0. 0. 0. 0.
0. 0. 0.16666667 0. ]
mean value: 0.029166666666666667
key: train_jcc
value: [0.06060606 0. 0. 0. 0.02941176 0.
0. 0. 0.03030303 0.06060606]
mean value: 0.018092691622103387
MCC on Blind test: 0.25
Accuracy on Blind test: 0.91
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03579068 0.03202891 0.06134105 0.04679179 0.03424931 0.03661156
0.036376 0.04108047 0.03971958 0.03681016]
mean value: 0.04007995128631592
key: score_time
value: [0.03434253 0.02598596 0.02994585 0.02226901 0.0206666 0.0219357
0.02279806 0.02205276 0.02154446 0.02946186]
mean value: 0.025100278854370116
key: test_mcc
value: [ 0.48181206 0. 0. 0. -0.04331481 0.
0. 0. -0.07254763 0.30832741]
mean value: 0.06742770289330487
key: train_mcc
value: [0.37447578 0.3344968 0.41076425 0.41076425 0.40414003 0.43710276
0.43710276 0.33586483 0.47562529 0.37451574]
mean value: 0.39948524928067985
key: test_accuracy
value: [0.93023256 0.90697674 0.90697674 0.90697674 0.9047619 0.92857143
0.92857143 0.9047619 0.85714286 0.9047619 ]
mean value: 0.9079734219269103
key: train_accuracy
value: [0.92650919 0.92388451 0.92913386 0.92913386 0.92670157 0.92931937
0.92931937 0.92408377 0.93455497 0.92670157]
mean value: 0.9279342045595086
key: test_fscore
value: [0.4 0. 0. 0. 0. 0.
0. 0. 0. 0.33333333]
mean value: 0.07333333333333333
key: train_fscore
value: [0.26315789 0.21621622 0.30769231 0.30769231 0.3 0.34146341
0.34146341 0.25641026 0.3902439 0.26315789]
mean value: 0.29874976091920896
key: test_precision
value: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.5]
mean value: 0.15
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.83333333 1. 1. ]
mean value: 0.9833333333333334
key: test_recall
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0. 0.25]
mean value: 0.05
key: train_recall
value: [0.15151515 0.12121212 0.18181818 0.18181818 0.17647059 0.20588235
0.20588235 0.15151515 0.24242424 0.15151515]
mean value: 0.17700534759358288
key: test_roc_auc
value: [0.625 0.5 0.5 0.5 0.48717949 0.5
0.5 0.5 0.47368421 0.61184211]
mean value: 0.519770580296896
key: train_roc_auc
value: [0.57575758 0.56060606 0.59090909 0.59090909 0.58823529 0.60294118
0.60294118 0.57432491 0.62121212 0.57575758]
mean value: 0.5883594073211468
key: test_jcc
value: [0.25 0. 0. 0. 0. 0. 0. 0. 0. 0.2 ]
mean value: 0.045
key: train_jcc
value: [0.15151515 0.12121212 0.18181818 0.18181818 0.17647059 0.20588235
0.20588235 0.14705882 0.24242424 0.15151515]
mean value: 0.17655971479500893
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.23823667 0.26242995 0.30091119 0.32965922 0.27202058 0.26451087
0.26610541 0.25792789 0.26833463 0.26426744]
mean value: 0.27244038581848146
key: score_time
value: [0.02227688 0.02225351 0.02281356 0.02597904 0.0257442 0.0224793
0.02255249 0.02845621 0.0224514 0.02763939]
mean value: 0.024264597892761232
key: test_mcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_mcc
value: [0.23590006 0.1665869 0.2892993 0.1665869 0.23209905 0.23209905
0.43710276 0.33586483 0.166607 0.166607 ]
mean value: 0.2428752859522287
key: test_accuracy
value: [0.90697674 0.90697674 0.90697674 0.90697674 0.92857143 0.92857143
0.92857143 0.9047619 0.9047619 0.9047619 ]
mean value: 0.9127906976744186
key: train_accuracy
value: [0.91863517 0.9160105 0.92125984 0.9160105 0.91623037 0.91623037
0.92931937 0.92408377 0.91623037 0.91623037]
mean value: 0.9190240617828531
key: test_fscore
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_fscore
value: [0.11428571 0.05882353 0.16666667 0.05882353 0.11111111 0.11111111
0.34146341 0.25641026 0.05882353 0.05882353]
mean value: 0.1336342391866065
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.83333333 1. 1. ]
mean value: 0.9833333333333334
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
train_recall
value: [0.06060606 0.03030303 0.09090909 0.03030303 0.05882353 0.05882353
0.20588235 0.15151515 0.03030303 0.03030303]
mean value: 0.07477718360071302
key: test_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
mean value: 0.5
key: train_roc_auc
value: [0.53030303 0.51515152 0.54545455 0.51515152 0.52941176 0.52941176
0.60294118 0.57432491 0.51515152 0.51515152]
mean value: 0.5372453253247118
key: test_jcc
value: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
mean value: 0.0
key: train_jcc
value: [0.06060606 0.03030303 0.09090909 0.03030303 0.05882353 0.05882353
0.20588235 0.14705882 0.03030303 0.03030303]
mean value: 0.07433155080213903
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03871989 0.03991413 0.04656005 0.03942466 0.0405128 0.04123592
0.07976103 0.06928873 0.0490365 0.05997825]
mean value: 0.050443196296691896
key: score_time
value: [0.01262212 0.01503849 0.01512909 0.01524734 0.01514578 0.01532006
0.01521468 0.01670432 0.0185647 0.01554704]
mean value: 0.015453362464904785
key: test_mcc
value: [0.8273403 0.84615385 0.69321982 0.8720816 0.76876426 0.82082657
0.77311567 0.79310508 0.82046748 0.87035806]
mean value: 0.8085432697325706
key: train_mcc
value: [0.84533031 0.86537495 0.8685192 0.8568314 0.86835763 0.86556123
0.87678715 0.85704295 0.86557337 0.86297105]
mean value: 0.8632349244357814
key: test_accuracy
value: [0.91025641 0.92307692 0.84615385 0.93589744 0.88311688 0.90909091
0.88311688 0.8961039 0.90909091 0.93506494]
mean value: 0.903096903096903
key: train_accuracy
value: [0.92241379 0.93247126 0.93390805 0.92816092 0.93400287 0.93256815
0.93830703 0.92826399 0.93256815 0.93113343]
mean value: 0.9313797638483484
key: test_fscore
value: [0.91566265 0.92307692 0.84210526 0.93670886 0.88607595 0.91139241
0.88888889 0.9 0.91358025 0.93670886]
mean value: 0.9054200048589064
key: train_fscore
value: [0.92372881 0.93352192 0.93521127 0.92937853 0.93502825 0.93370945
0.93900709 0.92937853 0.93352192 0.93239437]
mean value: 0.9324880147466529
key: test_precision
value: [0.86363636 0.92307692 0.86486486 0.925 0.85365854 0.87804878
0.8372093 0.87804878 0.88095238 0.925 ]
mean value: 0.882949593241709
key: train_precision
value: [0.90833333 0.91922006 0.91712707 0.91388889 0.92200557 0.91944444
0.92977528 0.91388889 0.91922006 0.91436464]
mean value: 0.9177268231612867
key: test_recall
value: [0.97435897 0.92307692 0.82051282 0.94871795 0.92105263 0.94736842
0.94736842 0.92307692 0.94871795 0.94871795]
mean value: 0.9302968960863698
key: train_recall
value: [0.93965517 0.94827586 0.95402299 0.9454023 0.94842407 0.94842407
0.94842407 0.9454023 0.94827586 0.95114943]
mean value: 0.9477456114349702
key: test_roc_auc
value: [0.91025641 0.92307692 0.84615385 0.93589744 0.88360324 0.90958165
0.88394062 0.89574899 0.9085695 0.93488529]
mean value: 0.9031713900134952
key: train_roc_auc
value: [0.92241379 0.93247126 0.93390805 0.92816092 0.93398215 0.93254537
0.93829249 0.92828854 0.93259065 0.93116211]
mean value: 0.9313815334453117
key: test_jcc
value: [0.84444444 0.85714286 0.72727273 0.88095238 0.79545455 0.8372093
0.8 0.81818182 0.84090909 0.88095238]
mean value: 0.8282519547635827
key: train_jcc
value: [0.85826772 0.87533156 0.87830688 0.86807388 0.87798408 0.87566138
0.88502674 0.86807388 0.87533156 0.87335092]
mean value: 0.87354086040645
MCC on Blind test: 0.06
Accuracy on Blind test: 0.87
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.15020847 1.00842071 0.94618225 1.08118463 0.94184732 1.0765748
0.97293162 1.11780429 0.96080112 1.09501767]
mean value: 1.0350972890853882
key: score_time
value: [0.01559567 0.01555538 0.01564121 0.01552844 0.01265383 0.01571798
0.01607513 0.0160296 0.01561451 0.01931334]
mean value: 0.015772509574890136
key: test_mcc
value: [0.8974359 0.8720816 0.74358974 0.8720816 0.82542666 0.92234997
0.87288888 0.79310508 0.94929201 0.82046748]
mean value: 0.8568718928423917
key: train_mcc
value: [0.95402299 0.95402299 0.9167613 0.95115335 0.96270101 0.92835792
0.94299704 0.96563082 0.95122353 0.9430017 ]
mean value: 0.9469872644108985
key: test_accuracy
value: [0.94871795 0.93589744 0.87179487 0.93589744 0.90909091 0.96103896
0.93506494 0.8961039 0.97402597 0.90909091]
mean value: 0.9276723276723277
key: train_accuracy
value: [0.97701149 0.97701149 0.95833333 0.97557471 0.98134864 0.96413199
0.9713056 0.98278336 0.97560976 0.9713056 ]
mean value: 0.9734415969920348
key: test_fscore
value: [0.94871795 0.93670886 0.87179487 0.93670886 0.91358025 0.96
0.93670886 0.9 0.975 0.91358025]
mean value: 0.9292799896618462
key: train_fscore
value: [0.97701149 0.97701149 0.95863053 0.97560976 0.981402 0.96443812
0.97175141 0.98285714 0.97560976 0.97167139]
mean value: 0.9735993097100867
key: test_precision
value: [0.94871795 0.925 0.87179487 0.925 0.86046512 0.97297297
0.90243902 0.87804878 0.95121951 0.88095238]
mean value: 0.9116610607790415
key: train_precision
value: [0.97701149 0.97701149 0.95184136 0.97421203 0.98 0.95762712
0.95821727 0.97727273 0.97421203 0.95810056]
mean value: 0.9685506091818026
key: test_recall
value: [0.94871795 0.94871795 0.87179487 0.94871795 0.97368421 0.94736842
0.97368421 0.92307692 1. 0.94871795]
mean value: 0.9484480431848853
key: train_recall
value: [0.97701149 0.97701149 0.96551724 0.97701149 0.98280802 0.9713467
0.98567335 0.98850575 0.97701149 0.98563218]
mean value: 0.9787529229654514
key: test_roc_auc
value: [0.94871795 0.93589744 0.87179487 0.93589744 0.90991903 0.9608637
0.93556005 0.89574899 0.97368421 0.9085695 ]
mean value: 0.9276653171390014
key: train_roc_auc
value: [0.97701149 0.97701149 0.95833333 0.97557471 0.98134654 0.96412163
0.97128495 0.98279156 0.97561176 0.97132612]
mean value: 0.9734413595494517
key: test_jcc
value: [0.90243902 0.88095238 0.77272727 0.88095238 0.84090909 0.92307692
0.88095238 0.81818182 0.95121951 0.84090909]
mean value: 0.8692319875246705
key: train_jcc
value: [0.95505618 0.95505618 0.92054795 0.95238095 0.96348315 0.93131868
0.94505495 0.96629213 0.95238095 0.94490358]
mean value: 0.9486474698057666
MCC on Blind test: 0.03
Accuracy on Blind test: 0.85
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01678801 0.01143074 0.01134968 0.01101875 0.01231956 0.01211452
0.01168275 0.01237202 0.01120925 0.01205993]
mean value: 0.01223452091217041
key: score_time
value: [0.01274705 0.00976276 0.00948977 0.01013064 0.01014805 0.01022387
0.00927925 0.01024818 0.00937724 0.00930762]
mean value: 0.010071444511413574
key: test_mcc
value: [0.35233213 0.63185859 0.29180696 0.5284982 0.47394296 0.39429125
0.56205154 0.4573956 0.55855494 0.37543482]
mean value: 0.46261669901632363
key: train_mcc
value: [0.46832427 0.44952017 0.47594774 0.46704729 0.49737181 0.47746987
0.46249932 0.4805902 0.45065019 0.46684356]
mean value: 0.4696264425127042
key: test_accuracy
value: [0.65384615 0.79487179 0.64102564 0.73076923 0.71428571 0.67532468
0.76623377 0.71428571 0.76623377 0.66233766]
mean value: 0.7119214119214119
key: train_accuracy
value: [0.71408046 0.70402299 0.71695402 0.71551724 0.73170732 0.72022956
0.71162123 0.723099 0.70444763 0.71449067]
mean value: 0.7156170121538944
key: test_fscore
value: [0.72164948 0.82608696 0.68181818 0.78350515 0.76086957 0.7311828
0.79545455 0.76086957 0.8 0.73469388]
mean value: 0.7596130126654452
key: train_fscore
value: [0.76224612 0.75534442 0.76519666 0.76144578 0.77333333 0.76590636
0.76042908 0.76606061 0.75534442 0.76110444]
mean value: 0.7626411224971984
key: test_precision
value: [0.60344828 0.71698113 0.6122449 0.65517241 0.64814815 0.61818182
0.7 0.66037736 0.70588235 0.61016949]
mean value: 0.653060588897696
key: train_precision
value: [0.65235174 0.6437247 0.65376782 0.65560166 0.67016807 0.65909091
0.65102041 0.66247379 0.6437247 0.65360825]
mean value: 0.6545532037931839
key: test_recall
value: [0.8974359 0.97435897 0.76923077 0.97435897 0.92105263 0.89473684
0.92105263 0.8974359 0.92307692 0.92307692]
mean value: 0.9095816464237517
key: train_recall
value: [0.91666667 0.9137931 0.92241379 0.90804598 0.91404011 0.91404011
0.91404011 0.90804598 0.9137931 0.91091954]
mean value: 0.9135798504759082
key: test_roc_auc
value: [0.65384615 0.79487179 0.64102564 0.73076923 0.71693657 0.67813765
0.76821862 0.71187584 0.76417004 0.65890688]
mean value: 0.7118758434547908
key: train_roc_auc
value: [0.71408046 0.70402299 0.71695402 0.71551724 0.73144534 0.71995109
0.7113304 0.72336396 0.70474755 0.71477209]
mean value: 0.7156185159569213
key: test_jcc
value: [0.56451613 0.7037037 0.51724138 0.6440678 0.61403509 0.57627119
0.66037736 0.61403509 0.66666667 0.58064516]
mean value: 0.6141559556983306
key: train_jcc
value: [0.61583012 0.60687023 0.61969112 0.61478599 0.63043478 0.62062257
0.61346154 0.62082515 0.60687023 0.61434109]
mean value: 0.6163732807537079
MCC on Blind test: -0.01
Accuracy on Blind test: 0.53
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01147652 0.01274586 0.01147652 0.01143885 0.01173377 0.01158214
0.0113976 0.01128554 0.01259947 0.0118711 ]
mean value: 0.011760735511779785
key: score_time
value: [0.00949717 0.01031971 0.00950551 0.00981569 0.00951743 0.00946426
0.01002526 0.00954485 0.01025271 0.01019597]
mean value: 0.009813857078552247
key: test_mcc
value: [0.58797473 0.64142698 0.59052059 0.65495315 0.68209247 0.56884727
0.53591229 0.64957894 0.69617048 0.66841121]
mean value: 0.6275888108821366
key: train_mcc
value: [0.65157066 0.64058368 0.65998413 0.63596054 0.64099643 0.65012309
0.68627114 0.64958683 0.63589348 0.64888805]
mean value: 0.6499858036443988
key: test_accuracy
value: [0.78205128 0.80769231 0.79487179 0.82051282 0.83116883 0.77922078
0.76623377 0.81818182 0.84415584 0.81818182]
mean value: 0.8062271062271062
key: train_accuracy
value: [0.8204023 0.81465517 0.82471264 0.81321839 0.81492109 0.81922525
0.83787661 0.81922525 0.80918221 0.81922525]
mean value: 0.8192644172892033
key: test_fscore
value: [0.80898876 0.83146067 0.8 0.8372093 0.84705882 0.79518072
0.775 0.8372093 0.85714286 0.84444444]
mean value: 0.8233694890861689
key: train_fscore
value: [0.83530962 0.8304862 0.83905013 0.82804233 0.83093054 0.83507853
0.85111989 0.83421053 0.82882883 0.83377309]
mean value: 0.8346829688450055
key: test_precision
value: [0.72 0.74 0.7804878 0.76595745 0.76595745 0.73333333
0.73809524 0.76595745 0.8 0.74509804]
mean value: 0.7554886755947838
key: train_precision
value: [0.77128954 0.76513317 0.77560976 0.76715686 0.76570048 0.7686747
0.78780488 0.76941748 0.75058275 0.77073171]
mean value: 0.7692101322032115
key: test_recall
value: [0.92307692 0.94871795 0.82051282 0.92307692 0.94736842 0.86842105
0.81578947 0.92307692 0.92307692 0.97435897]
mean value: 0.9067476383265857
key: train_recall
value: [0.91091954 0.90804598 0.9137931 0.89942529 0.90830946 0.91404011
0.92550143 0.91091954 0.92528736 0.90804598]
mean value: 0.9124287784474525
key: test_roc_auc
value: [0.78205128 0.80769231 0.79487179 0.82051282 0.83265857 0.78036437
0.7668691 0.81680162 0.84311741 0.81612686]
mean value: 0.80610661268556
key: train_roc_auc
value: [0.8204023 0.81465517 0.82471264 0.81321839 0.81478691 0.81908902
0.83775072 0.81935662 0.80934855 0.8193525 ]
mean value: 0.8192672825478379
key: test_jcc
value: [0.67924528 0.71153846 0.66666667 0.72 0.73469388 0.66
0.63265306 0.72 0.75 0.73076923]
mean value: 0.7005566580768737
key: train_jcc
value: [0.71719457 0.71011236 0.72272727 0.70654628 0.71076233 0.71685393
0.74082569 0.71557562 0.70769231 0.71493213]
mean value: 0.7163222485461479
MCC on Blind test: -0.15
Accuracy on Blind test: 0.71
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01447248 0.01196742 0.01213455 0.01347899 0.01171374 0.0110836
0.01132965 0.01156044 0.0113492 0.01024914]
mean value: 0.01193392276763916
key: score_time
value: [0.04076123 0.01978087 0.02064252 0.02100086 0.01851249 0.01303983
0.01356053 0.02004933 0.01407886 0.01326823]
mean value: 0.019469475746154784
key: test_mcc
value: [0.8273403 0.67371007 0.8229512 0.7073494 0.67092693 0.77968356
0.74986878 0.65781183 0.82485566 0.80937951]
mean value: 0.7523877248751364
key: train_mcc
value: [0.82010384 0.84990979 0.84938468 0.84729159 0.85220738 0.82869595
0.85797627 0.83839189 0.82556298 0.83914343]
mean value: 0.8408667792618629
key: test_accuracy
value: [0.91025641 0.82051282 0.91025641 0.84615385 0.81818182 0.88311688
0.87012987 0.81818182 0.90909091 0.8961039 ]
mean value: 0.8681984681984682
key: train_accuracy
value: [0.9066092 0.92241379 0.92241379 0.92097701 0.92395983 0.91104735
0.92682927 0.91535151 0.90961263 0.91678623]
mean value: 0.9176000593677337
key: test_fscore
value: [0.91566265 0.84444444 0.91358025 0.86046512 0.84090909 0.89156627
0.87804878 0.84090909 0.91566265 0.90697674]
mean value: 0.8808225080394188
key: train_fscore
value: [0.9122807 0.92643052 0.92622951 0.92517007 0.92769441 0.91644205
0.93042292 0.92059219 0.91474966 0.92119565]
mean value: 0.9221207677952111
key: test_precision
value: [0.86363636 0.74509804 0.88095238 0.78723404 0.74 0.82222222
0.81818182 0.75510204 0.86363636 0.82978723]
mean value: 0.8105850505256906
key: train_precision
value: [0.86005089 0.88082902 0.8828125 0.87855297 0.88541667 0.86513995
0.88802083 0.86582278 0.86445013 0.87371134]
mean value: 0.8744807079708475
key: test_recall
value: [0.97435897 0.97435897 0.94871795 0.94871795 0.97368421 0.97368421
0.94736842 0.94871795 0.97435897 1. ]
mean value: 0.9663967611336032
key: train_recall
value: [0.97126437 0.97701149 0.97413793 0.97701149 0.97421203 0.97421203
0.97707736 0.98275862 0.97126437 0.97413793]
mean value: 0.9753087639561309
key: test_roc_auc
value: [0.91025641 0.82051282 0.91025641 0.84615385 0.82017544 0.884278
0.87112011 0.81646424 0.90823212 0.89473684]
mean value: 0.8682186234817815
key: train_roc_auc
value: [0.9066092 0.92241379 0.92241379 0.92097701 0.92388763 0.91095659
0.92675707 0.91544808 0.90970095 0.91686839]
mean value: 0.9176032506669302
key: test_jcc
value: [0.84444444 0.73076923 0.84090909 0.75510204 0.7254902 0.80434783
0.7826087 0.7254902 0.84444444 0.82978723]
mean value: 0.7883393399322083
key: train_jcc
value: [0.83870968 0.86294416 0.86259542 0.86075949 0.86513995 0.84577114
0.86989796 0.85286783 0.84289277 0.85390428]
mean value: 0.8555482686565422
MCC on Blind test: -0.12
Accuracy on Blind test: 0.78
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0282619 0.02842212 0.02797508 0.02814007 0.02827001 0.02821541
0.02812767 0.03055072 0.02872539 0.03221798]
mean value: 0.02889063358306885
key: score_time
value: [0.01391196 0.01416802 0.01403022 0.01399994 0.01393795 0.01388454
0.01421952 0.01358271 0.0143075 0.01583433]
mean value: 0.01418766975402832
key: test_mcc
value: [0.97467943 0.92338052 0.79487179 0.84726867 0.79338303 0.87288888
0.81848853 0.79217274 0.84516739 0.97434188]
mean value: 0.863664287478498
key: train_mcc
value: [0.91967693 0.91110388 0.92542491 0.91967693 0.91971476 0.91122845
0.91989528 0.92271332 0.91971741 0.91168638]
mean value: 0.9180838262644812
key: test_accuracy
value: [0.98717949 0.96153846 0.8974359 0.92307692 0.8961039 0.93506494
0.90909091 0.8961039 0.92207792 0.98701299]
mean value: 0.9314685314685315
key: train_accuracy
value: [0.95977011 0.95545977 0.96264368 0.95977011 0.95982783 0.95552367
0.95982783 0.96126255 0.95982783 0.95552367]
mean value: 0.9589437078447863
key: test_fscore
value: [0.98734177 0.96202532 0.8974359 0.925 0.8974359 0.93670886
0.90909091 0.8974359 0.925 0.98734177]
mean value: 0.9324816322917588
key: train_fscore
value: [0.96011396 0.95590327 0.96296296 0.96011396 0.96011396 0.95602837
0.96033994 0.96159317 0.96 0.95627645]
mean value: 0.9593446044952345
key: test_precision
value: [0.975 0.95 0.8974359 0.90243902 0.875 0.90243902
0.8974359 0.8974359 0.90243902 0.975 ]
mean value: 0.9174624765478424
key: train_precision
value: [0.9519774 0.94647887 0.95480226 0.9519774 0.95467422 0.94662921
0.94957983 0.95211268 0.95454545 0.93905817]
mean value: 0.9501835504112366
key: test_recall
value: [1. 0.97435897 0.8974359 0.94871795 0.92105263 0.97368421
0.92105263 0.8974359 0.94871795 1. ]
mean value: 0.9482456140350877
key: train_recall
value: [0.9683908 0.96551724 0.97126437 0.9683908 0.96561605 0.96561605
0.9713467 0.97126437 0.96551724 0.97413793]
mean value: 0.9687061555182294
key: test_roc_auc
value: [0.98717949 0.96153846 0.8974359 0.92307692 0.89642375 0.93556005
0.90924426 0.89608637 0.9217274 0.98684211]
mean value: 0.9315114709851552
key: train_roc_auc
value: [0.95977011 0.95545977 0.96264368 0.95977011 0.95981952 0.95550917
0.95981128 0.96127688 0.95983598 0.95555034]
mean value: 0.9589446859664724
key: test_jcc
value: [0.975 0.92682927 0.81395349 0.86046512 0.81395349 0.88095238
0.83333333 0.81395349 0.86046512 0.975 ]
mean value: 0.8753905680252816
key: train_jcc
value: [0.92328767 0.91553134 0.92857143 0.92328767 0.92328767 0.91576087
0.92370572 0.9260274 0.92307692 0.91621622]
mean value: 0.9218752905609398
MCC on Blind test: -0.07
Accuracy on Blind test: 0.87
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.67909145 2.95052457 2.74347472 2.90364742 2.81919289 3.5941174
3.02195024 3.05154705 3.3588202 3.07907033]
mean value: 3.0201436281204224
key: score_time
value: [0.01559877 0.01588106 0.01902914 0.02346945 0.01293278 0.01353502
0.01314259 0.01360226 0.01294136 0.01363468]
mean value: 0.015376710891723632
key: test_mcc
value: [0.9258201 0.97467943 0.84726867 0.8229512 0.71670195 0.89608637
0.87044534 0.81848853 0.92234997 0.8972297 ]
mean value: 0.869202127487168
key: train_mcc
value: [1. 1. 1. 1. 0.99713467 1.
0.99713467 0.99713465 0.99713465 0.99713465]
mean value: 0.9985673281759511
key: test_accuracy
value: [0.96153846 0.98717949 0.92307692 0.91025641 0.85714286 0.94805195
0.93506494 0.90909091 0.96103896 0.94805195]
mean value: 0.934049284049284
key: train_accuracy
value: [1. 1. 1. 1. 0.99856528 1.
0.99856528 0.99856528 0.99856528 0.99856528]
mean value: 0.9992826398852224
key: test_fscore
value: [0.96296296 0.98701299 0.925 0.91358025 0.86075949 0.94736842
0.93506494 0.90909091 0.96202532 0.95 ]
mean value: 0.9352865272224588
key: train_fscore
value: [1. 1. 1. 1. 0.99856528 1.
0.99856528 0.99856115 0.99856115 0.99856115]
mean value: 0.9992814012778299
key: test_precision
value: [0.92857143 1. 0.90243902 0.88095238 0.82926829 0.94736842
0.92307692 0.92105263 0.95 0.92682927]
mean value: 0.9209558370598165
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.97435897 0.94871795 0.94871795 0.89473684 0.94736842
0.94736842 0.8974359 0.97435897 0.97435897]
mean value: 0.9507422402159245
key: train_recall
value: [1. 1. 1. 1. 0.99713467 1.
0.99713467 0.99712644 0.99712644 0.99712644]
mean value: 0.9985648651319039
key: test_roc_auc
value: [0.96153846 0.98717949 0.92307692 0.91025641 0.85762483 0.94804318
0.93522267 0.90924426 0.9608637 0.9477058 ]
mean value: 0.9340755735492577
key: train_roc_auc
value: [1. 1. 1. 1. 0.99856734 1.
0.99856734 0.99856322 0.99856322 0.99856322]
mean value: 0.999282432565952
key: test_jcc
value: [0.92857143 0.97435897 0.86046512 0.84090909 0.75555556 0.9
0.87804878 0.83333333 0.92682927 0.9047619 ]
mean value: 0.8802833452549845
key: train_jcc
value: [1. 1. 1. 1. 0.99713467 1.
0.99713467 0.99712644 0.99712644 0.99712644]
mean value: 0.9985648651319039
MCC on Blind test: -0.05
Accuracy on Blind test: 0.89
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.08217692 0.05834174 0.07320738 0.06734514 0.07927251 0.0670433
0.07197666 0.06617594 0.06804967 0.06569481]
mean value: 0.06992840766906738
key: score_time
value: [0.00984645 0.00930905 0.00995088 0.00985503 0.01845741 0.0099628
0.01521873 0.01029706 0.01649356 0.01028228]
mean value: 0.011967325210571289
key: test_mcc
value: [0.87904907 0.9258201 0.8974359 0.87438565 0.87288888 0.92240216
0.79338303 0.87263594 0.81836616 0.84516739]
mean value: 0.8701534274011227
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93589744 0.96153846 0.94871795 0.93589744 0.93506494 0.96103896
0.8961039 0.93506494 0.90909091 0.92207792]
mean value: 0.934049284049284
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93975904 0.96 0.94871795 0.93333333 0.93670886 0.96103896
0.8974359 0.9382716 0.91139241 0.925 ]
mean value: 0.9351658047431776
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88636364 1. 0.94871795 0.97222222 0.90243902 0.94871795
0.875 0.9047619 0.9 0.90243902]
mean value: 0.9240661709564149
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.92307692 0.94871795 0.8974359 0.97368421 0.97368421
0.92105263 0.97435897 0.92307692 0.94871795]
mean value: 0.9483805668016194
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93589744 0.96153846 0.94871795 0.93589744 0.93556005 0.96120108
0.89642375 0.93454791 0.90890688 0.9217274 ]
mean value: 0.9340418353576249
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88636364 0.92307692 0.90243902 0.875 0.88095238 0.925
0.81395349 0.88372093 0.8372093 0.86046512]
mean value: 0.8788180801992487
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.94
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.1527071 0.14913678 0.14711428 0.15268421 0.14098549 0.14562392
0.15315819 0.15042734 0.15176177 0.15369225]
mean value: 0.14972913265228271
key: score_time
value: [0.02040267 0.02036953 0.02008748 0.02046919 0.01954436 0.01913452
0.02040911 0.02066565 0.02030325 0.02046251]
mean value: 0.020184826850891114
key: test_mcc
value: [0.92338052 0.94996791 0.87438565 0.94996791 0.89736685 0.97434188
0.89608637 0.87288888 0.92240216 0.94935876]
mean value: 0.9210146881134006
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96153846 0.97435897 0.93589744 0.97435897 0.94805195 0.98701299
0.94805195 0.93506494 0.96103896 0.97402597]
mean value: 0.9599400599400599
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96202532 0.97368421 0.93333333 0.97368421 0.94871795 0.98666667
0.94736842 0.93333333 0.96103896 0.97368421]
mean value: 0.9593536612177519
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 1. 0.97222222 1. 0.925 1.
0.94736842 0.97222222 0.97368421 1. ]
mean value: 0.9740497076023392
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.97435897 0.94871795 0.8974359 0.94871795 0.97368421 0.97368421
0.94736842 0.8974359 0.94871795 0.94871795]
mean value: 0.9458839406207827
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.97435897 0.93589744 0.97435897 0.94838057 0.98684211
0.94804318 0.93556005 0.96120108 0.97435897]
mean value: 0.9600539811066127
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92682927 0.94871795 0.875 0.94871795 0.90243902 0.97368421
0.9 0.875 0.925 0.94871795]
mean value: 0.9224106349363089
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.03
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01263523 0.01260734 0.01209092 0.01136661 0.01154256 0.01153016
0.01169491 0.0120523 0.01149178 0.01149225]
mean value: 0.011850404739379882
key: score_time
value: [0.01004624 0.0099802 0.00906777 0.00966549 0.00907993 0.0091114
0.00958633 0.00911474 0.00912404 0.00935435]
mean value: 0.009413051605224609
key: test_mcc
value: [0.74753193 0.84726867 0.672214 0.8273403 0.7445861 0.76876426
0.63691815 0.81848853 0.79217274 0.74390304]
mean value: 0.7599187712192945
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.87179487 0.92307692 0.83333333 0.91025641 0.87012987 0.88311688
0.81818182 0.90909091 0.8961039 0.87012987]
mean value: 0.8785214785214785
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.87804878 0.92105263 0.82191781 0.91566265 0.875 0.88607595
0.81081081 0.90909091 0.8974359 0.87804878]
mean value: 0.8793144218080852
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8372093 0.94594595 0.88235294 0.86363636 0.83333333 0.85365854
0.83333333 0.92105263 0.8974359 0.8372093 ]
mean value: 0.8705167587676821
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.8974359 0.76923077 0.97435897 0.92105263 0.92105263
0.78947368 0.8974359 0.8974359 0.92307692]
mean value: 0.8913630229419703
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87179487 0.92307692 0.83333333 0.91025641 0.87078273 0.88360324
0.81781377 0.90924426 0.89608637 0.8694332 ]
mean value: 0.8785425101214575
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7826087 0.85365854 0.69767442 0.84444444 0.77777778 0.79545455
0.68181818 0.83333333 0.81395349 0.7826087 ]
mean value: 0.7863332117694741
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.09
Accuracy on Blind test: 0.84
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.93777823 2.89282584 2.86601663 2.90190744 2.88386822 2.88923788
2.87295818 2.85391903 2.88457561 2.83187699]
mean value: 2.8814964056015016
key: score_time
value: [0.09594989 0.0984354 0.09527922 0.09927225 0.09543991 0.09534168
0.09555697 0.10307527 0.09866238 0.09446812]
mean value: 0.09714810848236084
key: test_mcc
value: [0.92338052 0.97467943 0.84726867 0.94996791 0.92240216 0.97434188
0.89608637 0.94935876 0.92240216 0.89736685]
mean value: 0.9257254710305122
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96153846 0.98717949 0.92307692 0.97435897 0.96103896 0.98701299
0.94805195 0.97402597 0.96103896 0.94805195]
mean value: 0.9625374625374625
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96202532 0.98701299 0.92105263 0.97368421 0.96103896 0.98666667
0.94736842 0.97368421 0.96103896 0.94736842]
mean value: 0.9620940786950114
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 1. 0.94594595 1. 0.94871795 1.
0.94736842 1. 0.97368421 0.97297297]
mean value: 0.9738689499215815
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.97435897 0.97435897 0.8974359 0.94871795 0.97368421 0.97368421
0.94736842 0.94871795 0.94871795 0.92307692]
mean value: 0.9510121457489878
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.98717949 0.92307692 0.97435897 0.96120108 0.98684211
0.94804318 0.97435897 0.96120108 0.94838057]
mean value: 0.9626180836707152
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92682927 0.97435897 0.85365854 0.94871795 0.925 0.97368421
0.9 0.94871795 0.925 0.9 ]
mean value: 0.9275966887199236
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.09418011 1.13239908 1.11712027 1.10592294 1.11697221 1.10063672
1.1310041 1.14077425 1.11971807 1.14388514]
mean value: 1.120261287689209
key: score_time
value: [0.23861861 0.27041721 0.26881766 0.24175239 0.25036669 0.22907615
0.23409081 0.26399255 0.13960528 0.27557015]
mean value: 0.24123075008392333
key: test_mcc
value: [0.92338052 0.97467943 0.82078268 0.94996791 0.89608637 0.90083601
0.87044534 0.94935876 0.87263594 0.92240216]
mean value: 0.9080575112450399
key: train_mcc
value: [0.97140876 0.97152111 0.97707604 0.97152111 0.97718983 0.96863241
0.96847236 0.97144882 0.97156063 0.97144882]
mean value: 0.9720279892375501
key: test_accuracy
value: [0.96153846 0.98717949 0.91025641 0.97435897 0.94805195 0.94805195
0.93506494 0.97402597 0.93506494 0.96103896]
mean value: 0.9534632034632035
key: train_accuracy
value: [0.98563218 0.98563218 0.98850575 0.98563218 0.98852224 0.98421808
0.98421808 0.9856528 0.9856528 0.9856528 ]
mean value: 0.985931908507726
key: test_fscore
value: [0.96202532 0.98701299 0.90909091 0.97368421 0.94736842 0.94444444
0.93506494 0.97368421 0.9382716 0.96103896]
mean value: 0.9531686000151467
key: train_fscore
value: [0.98550725 0.98546512 0.98843931 0.98546512 0.98843931 0.98408104
0.98417266 0.98550725 0.98546512 0.98550725]
mean value: 0.9858049404523073
key: test_precision
value: [0.95 1. 0.92105263 1. 0.94736842 1.
0.92307692 1. 0.9047619 0.97368421]
mean value: 0.9619944090996723
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
train_precision
value: [0.99415205 0.99705882 0.99418605 0.99705882 0.99708455 0.99415205
0.98843931 0.99415205 0.99705882 0.99415205]
mean value: 0.9947494558697704
key: test_recall
value: [0.97435897 0.97435897 0.8974359 0.94871795 0.94736842 0.89473684
0.94736842 0.94871795 0.97435897 0.94871795]
mean value: 0.9456140350877192
key: train_recall
value: [0.97701149 0.97413793 0.98275862 0.97413793 0.97994269 0.97421203
0.97994269 0.97701149 0.97413793 0.97701149]
mean value: 0.9770304317755163
key: test_roc_auc
value: [0.96153846 0.98717949 0.91025641 0.97435897 0.94804318 0.94736842
0.93522267 0.97435897 0.93454791 0.96120108]
mean value: 0.9534075573549258
key: train_roc_auc
value: [0.98563218 0.98563218 0.98850575 0.98563218 0.98853457 0.98423245
0.98422422 0.98564042 0.9856363 0.98564042]
mean value: 0.9859310674175806
key: test_jcc
value: [0.92682927 0.97435897 0.83333333 0.94871795 0.9 0.89473684
0.87804878 0.94871795 0.88372093 0.925 ]
mean value: 0.9113464026246514
key: train_jcc
value: [0.97142857 0.9713467 0.97714286 0.9713467 0.97714286 0.96866097
0.96883853 0.97142857 0.9713467 0.97142857]
mean value: 0.9720111038757758
MCC on Blind test: 0.32
Accuracy on Blind test: 0.93
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02740169 0.01271296 0.01271749 0.01260138 0.01279283 0.01355386
0.01323056 0.01276207 0.01328802 0.01146173]
mean value: 0.014252257347106934
key: score_time
value: [0.01019192 0.01009679 0.01034951 0.01021194 0.01021576 0.01021719
0.01018167 0.01012897 0.01046205 0.00936437]
mean value: 0.010142016410827636
key: test_mcc
value: [0.58797473 0.64142698 0.59052059 0.65495315 0.68209247 0.56884727
0.53591229 0.64957894 0.69617048 0.66841121]
mean value: 0.6275888108821366
key: train_mcc
value: [0.65157066 0.64058368 0.65998413 0.63596054 0.64099643 0.65012309
0.68627114 0.64958683 0.63589348 0.64888805]
mean value: 0.6499858036443988
key: test_accuracy
value: [0.78205128 0.80769231 0.79487179 0.82051282 0.83116883 0.77922078
0.76623377 0.81818182 0.84415584 0.81818182]
mean value: 0.8062271062271062
key: train_accuracy
value: [0.8204023 0.81465517 0.82471264 0.81321839 0.81492109 0.81922525
0.83787661 0.81922525 0.80918221 0.81922525]
mean value: 0.8192644172892033
key: test_fscore
value: [0.80898876 0.83146067 0.8 0.8372093 0.84705882 0.79518072
0.775 0.8372093 0.85714286 0.84444444]
mean value: 0.8233694890861689
key: train_fscore
value: [0.83530962 0.8304862 0.83905013 0.82804233 0.83093054 0.83507853
0.85111989 0.83421053 0.82882883 0.83377309]
mean value: 0.8346829688450055
key: test_precision
value: [0.72 0.74 0.7804878 0.76595745 0.76595745 0.73333333
0.73809524 0.76595745 0.8 0.74509804]
mean value: 0.7554886755947838
key: train_precision
value: [0.77128954 0.76513317 0.77560976 0.76715686 0.76570048 0.7686747
0.78780488 0.76941748 0.75058275 0.77073171]
mean value: 0.7692101322032115
key: test_recall
value: [0.92307692 0.94871795 0.82051282 0.92307692 0.94736842 0.86842105
0.81578947 0.92307692 0.92307692 0.97435897]
mean value: 0.9067476383265857
key: train_recall
value: [0.91091954 0.90804598 0.9137931 0.89942529 0.90830946 0.91404011
0.92550143 0.91091954 0.92528736 0.90804598]
mean value: 0.9124287784474525
key: test_roc_auc
value: [0.78205128 0.80769231 0.79487179 0.82051282 0.83265857 0.78036437
0.7668691 0.81680162 0.84311741 0.81612686]
mean value: 0.80610661268556
key: train_roc_auc
value: [0.8204023 0.81465517 0.82471264 0.81321839 0.81478691 0.81908902
0.83775072 0.81935662 0.80934855 0.8193525 ]
mean value: 0.8192672825478379
key: test_jcc
value: [0.67924528 0.71153846 0.66666667 0.72 0.73469388 0.66
0.63265306 0.72 0.75 0.73076923]
mean value: 0.7005566580768737
key: train_jcc
value: [0.71719457 0.71011236 0.72272727 0.70654628 0.71076233 0.71685393
0.74082569 0.71557562 0.70769231 0.71493213]
mean value: 0.7163222485461479
MCC on Blind test: -0.15
Accuracy on Blind test: 0.71
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.14078021 0.1394918 0.1099894 0.11952615 0.11421227 0.10934782
0.27335286 0.1076355 0.10954022 0.13076091]
mean value: 0.13546371459960938
key: score_time
value: [0.01130152 0.01149511 0.01126742 0.0115304 0.01148415 0.01134205
0.01231551 0.01155114 0.01133633 0.01195455]
mean value: 0.011557817459106445
key: test_mcc
value: [0.89861829 1. 0.8720816 0.89861829 0.848923 1.
0.77311567 0.92480439 0.92234997 0.92234997]
mean value: 0.9060861186682375
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94871795 1. 0.93589744 0.94871795 0.92207792 1.
0.88311688 0.96103896 0.96103896 0.96103896]
mean value: 0.9521645021645021
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95 1. 0.93670886 0.95 0.925 1.
0.88888889 0.96296296 0.96202532 0.96202532]
mean value: 0.9537611345522738
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92682927 1. 0.925 0.92682927 0.88095238 1.
0.8372093 0.92857143 0.95 0.95 ]
mean value: 0.9325391648434757
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.97435897 1. 0.94871795 0.97435897 0.97368421 1.
0.94736842 1. 0.97435897 0.97435897]
mean value: 0.9767206477732794
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94871795 1. 0.93589744 0.94871795 0.92273954 1.
0.88394062 0.96052632 0.9608637 0.9608637 ]
mean value: 0.9522267206477733
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9047619 1. 0.88095238 0.9047619 0.86046512 1.
0.8 0.92857143 0.92682927 0.92682927]
mean value: 0.9133171271912055
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.94
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04523849 0.05120158 0.07097363 0.0718708 0.06695127 0.08841419
0.07230806 0.04964948 0.08339453 0.09027886]
mean value: 0.06902809143066406
key: score_time
value: [0.01256251 0.01249528 0.01961923 0.01253366 0.01976752 0.01962662
0.01245499 0.01242352 0.01572442 0.01973152]
mean value: 0.015693926811218263
key: test_mcc
value: [0.87438565 0.8720816 0.74456944 0.84726867 0.85485041 0.87044534
0.8023596 0.89608637 0.84852502 0.74390304]
mean value: 0.8354475142184004
key: train_mcc
value: [0.92008741 0.91110388 0.91101358 0.91137495 0.92298137 0.91167699
0.9285709 0.92614622 0.91709499 0.9202051 ]
mean value: 0.9180255393422112
key: test_accuracy
value: [0.93589744 0.93589744 0.87179487 0.92307692 0.92207792 0.93506494
0.8961039 0.94805195 0.92207792 0.87012987]
mean value: 0.916017316017316
key: train_accuracy
value: [0.95977011 0.95545977 0.95545977 0.95545977 0.96126255 0.95552367
0.96413199 0.96269727 0.95839311 0.95982783]
mean value: 0.9587985867181187
key: test_fscore
value: [0.9382716 0.93670886 0.86842105 0.925 0.92682927 0.93506494
0.90243902 0.94871795 0.92682927 0.87804878]
mean value: 0.9186330743575642
key: train_fscore
value: [0.96045198 0.95590327 0.95577746 0.95615276 0.96191819 0.95639944
0.96463932 0.96338028 0.95886525 0.96045198]
mean value: 0.959393992844278
key: test_precision
value: [0.9047619 0.925 0.89189189 0.90243902 0.86363636 0.92307692
0.84090909 0.94871795 0.88372093 0.8372093 ]
mean value: 0.8921363379942506
key: train_precision
value: [0.94444444 0.94647887 0.9490085 0.94150418 0.94722222 0.93922652
0.95251397 0.94475138 0.94677871 0.94444444]
mean value: 0.9456373239724625
key: test_recall
value: [0.97435897 0.94871795 0.84615385 0.94871795 1. 0.94736842
0.97368421 0.94871795 0.97435897 0.92307692]
mean value: 0.9485155195681512
key: train_recall
value: [0.97701149 0.96551724 0.96264368 0.97126437 0.97707736 0.97421203
0.97707736 0.98275862 0.97126437 0.97701149]
mean value: 0.9735838026545467
key: test_roc_auc
value: [0.93589744 0.93589744 0.87179487 0.92307692 0.92307692 0.93522267
0.89709852 0.94804318 0.92139001 0.8694332 ]
mean value: 0.9160931174089069
key: train_roc_auc
value: [0.95977011 0.95545977 0.95545977 0.95545977 0.96123983 0.95549682
0.96411339 0.96272602 0.95841155 0.95985245]
mean value: 0.9587989493791786
key: test_jcc
value: [0.88372093 0.88095238 0.76744186 0.86046512 0.86363636 0.87804878
0.82222222 0.90243902 0.86363636 0.7826087 ]
mean value: 0.8505171737954298
key: train_jcc
value: [0.92391304 0.91553134 0.91530055 0.91598916 0.92663043 0.91644205
0.93169399 0.92934783 0.92098093 0.92391304]
mean value: 0.9219742353334713
MCC on Blind test: -0.01
Accuracy on Blind test: 0.81
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01499748 0.01322985 0.01080799 0.01045179 0.01089287 0.01186538
0.010602 0.01103258 0.01051807 0.01101613]
mean value: 0.011541414260864257
key: score_time
value: [0.01211143 0.00941396 0.00903964 0.00900626 0.00911808 0.00900555
0.0089345 0.00900006 0.00890112 0.00944853]
mean value: 0.00939791202545166
key: test_mcc
value: [0.47797332 0.45097439 0.44536814 0.46915743 0.56205154 0.39198069
0.6161827 0.44181392 0.41782699 0.52670239]
mean value: 0.48000315159263324
key: train_mcc
value: [0.506527 0.52631404 0.51166009 0.49897566 0.49693005 0.49764787
0.50623464 0.53562064 0.49893986 0.49634451]
mean value: 0.5075194354563486
key: test_accuracy
value: [0.71794872 0.71794872 0.71794872 0.73076923 0.76623377 0.68831169
0.80519481 0.71428571 0.7012987 0.75324675]
mean value: 0.7313186813186814
key: train_accuracy
value: [0.74425287 0.75574713 0.74712644 0.74137931 0.74031564 0.73888092
0.7446198 0.76040172 0.73888092 0.73888092]
mean value: 0.7450485661043224
key: test_fscore
value: [0.76595745 0.75 0.74418605 0.75294118 0.79545455 0.72093023
0.81481481 0.75 0.74157303 0.78651685]
mean value: 0.7622374150258675
key: train_fscore
value: [0.77411168 0.781491 0.77608142 0.77040816 0.77001271 0.77135678
0.77411168 0.78507079 0.77135678 0.76962025]
mean value: 0.7743621253581037
key: test_precision
value: [0.65454545 0.67346939 0.68085106 0.69565217 0.7 0.64583333
0.76744186 0.67346939 0.66 0.7 ]
mean value: 0.6851262661596939
key: train_precision
value: [0.69318182 0.70697674 0.69634703 0.69266055 0.69178082 0.68680089
0.69476082 0.71095571 0.68526786 0.68778281]
mean value: 0.6946515055136435
key: test_recall
value: [0.92307692 0.84615385 0.82051282 0.82051282 0.92105263 0.81578947
0.86842105 0.84615385 0.84615385 0.8974359 ]
mean value: 0.8605263157894737
key: train_recall
value: [0.87643678 0.87356322 0.87643678 0.86781609 0.86819484 0.87965616
0.8739255 0.87643678 0.88218391 0.87356322]
mean value: 0.8748213285907189
key: test_roc_auc
value: [0.71794872 0.71794872 0.71794872 0.73076923 0.76821862 0.68994602
0.8060054 0.71255061 0.69939271 0.75134953]
mean value: 0.7312078272604589
key: train_roc_auc
value: [0.74425287 0.75574713 0.74712644 0.74137931 0.7401319 0.73867865
0.74443402 0.76056796 0.73908622 0.73907387]
mean value: 0.7450478378289365
key: test_jcc
value: [0.62068966 0.6 0.59259259 0.60377358 0.66037736 0.56363636
0.6875 0.6 0.58928571 0.64814815]
mean value: 0.6166003417231458
key: train_jcc
value: [0.63146998 0.64135021 0.63409563 0.62655602 0.62603306 0.62781186
0.63146998 0.64618644 0.62781186 0.6255144 ]
mean value: 0.6318299443958518
MCC on Blind test: -0.04
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02047563 0.02787757 0.02960706 0.02839184 0.02712417 0.0304358
0.02571654 0.02566147 0.02777052 0.02505445]
mean value: 0.026811504364013673
key: score_time
value: [0.01174617 0.01240253 0.01236677 0.01221895 0.01217604 0.01217866
0.01223135 0.01222467 0.01236272 0.01219702]
mean value: 0.012210488319396973
key: test_mcc
value: [0.8121419 0.75974565 0.80444891 0.79056942 0.76829903 0.89608637
0.81032908 0.79338303 0.80937951 0.82542666]
mean value: 0.8069809550761771
key: train_mcc
value: [0.85307516 0.79431719 0.83600311 0.73335797 0.78898822 0.90614227
0.83320885 0.86942774 0.78686153 0.84170414]
mean value: 0.8243086156900167
key: test_accuracy
value: [0.8974359 0.87179487 0.8974359 0.88461538 0.88311688 0.94805195
0.8961039 0.8961039 0.8961039 0.90909091]
mean value: 0.897985347985348
key: train_accuracy
value: [0.92385057 0.8908046 0.91235632 0.85057471 0.8866571 0.95265423
0.91248207 0.93400287 0.88235294 0.91822095]
mean value: 0.9063956364715777
key: test_fscore
value: [0.90697674 0.85714286 0.9047619 0.89655172 0.87671233 0.94736842
0.9047619 0.89473684 0.90697674 0.90410959]
mean value: 0.9000099060142804
key: train_fscore
value: [0.92789116 0.88012618 0.9189907 0.86967419 0.87440382 0.95371669
0.91834003 0.93195266 0.89460154 0.91324201]
mean value: 0.9082938975409695
key: test_precision
value: [0.82978723 0.96774194 0.84444444 0.8125 0.91428571 0.94736842
0.82608696 0.91891892 0.82978723 0.97058824]
mean value: 0.8861509094086544
key: train_precision
value: [0.88113695 0.97552448 0.85432099 0.77111111 0.98214286 0.93406593
0.86180905 0.96036585 0.80930233 0.97087379]
mean value: 0.9000653327276921
key: test_recall
value: [1. 0.76923077 0.97435897 1. 0.84210526 0.94736842
1. 0.87179487 1. 0.84615385]
mean value: 0.9251012145748988
key: train_recall
value: [0.97988506 0.80172414 0.99425287 0.99712644 0.78796562 0.97421203
0.98280802 0.90517241 1. 0.86206897]
mean value: 0.9285215558409907
key: test_roc_auc
value: [0.8974359 0.87179487 0.8974359 0.88461538 0.88259109 0.94804318
0.8974359 0.89642375 0.89473684 0.90991903]
mean value: 0.8980431848852901
key: train_roc_auc
value: [0.92385057 0.8908046 0.91235632 0.85057471 0.8867989 0.95262326
0.91238102 0.93396157 0.88252149 0.9181405 ]
mean value: 0.9064012943385041
key: test_jcc
value: [0.82978723 0.75 0.82608696 0.8125 0.7804878 0.9
0.82608696 0.80952381 0.82978723 0.825 ]
mean value: 0.8189259995530442
key: train_jcc
value: [0.86548223 0.78591549 0.85012285 0.76940133 0.77683616 0.91152815
0.8490099 0.87257618 0.80930233 0.84033613]
mean value: 0.8330510753596808
MCC on Blind test: -0.03
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02893615 0.02541661 0.02783608 0.02274251 0.02748966 0.02412629
0.02926254 0.02447009 0.0322752 0.02960229]
mean value: 0.027215743064880372
key: score_time
value: [0.01226091 0.01221442 0.01219726 0.01218629 0.0122304 0.01245189
0.01216125 0.01214123 0.01239443 0.01225305]
mean value: 0.012249112129211426
key: test_mcc
value: [0.92338052 0.90219371 0.72760688 0.8229512 0.76876426 0.80158863
0.74139112 0.76637425 0.8542977 0.79338303]
mean value: 0.8101931285796923
key: train_mcc
value: [0.89656653 0.83349695 0.81700929 0.86067435 0.90040991 0.86643501
0.8853562 0.91446309 0.87900354 0.85437949]
mean value: 0.8707794357555662
key: test_accuracy
value: [0.96153846 0.94871795 0.84615385 0.91025641 0.88311688 0.8961039
0.87012987 0.88311688 0.92207792 0.8961039 ]
mean value: 0.9017316017316017
key: train_accuracy
value: [0.94827586 0.91091954 0.90086207 0.92672414 0.94978479 0.93256815
0.94261119 0.95695839 0.93687231 0.92539455]
mean value: 0.9330970992265704
key: test_fscore
value: [0.96202532 0.95121951 0.86666667 0.91358025 0.88607595 0.88571429
0.87179487 0.88607595 0.92857143 0.89473684]
mean value: 0.9046461069151092
key: train_fscore
value: [0.9481268 0.91777188 0.9095675 0.93117409 0.94875549 0.93078056
0.94219653 0.95762712 0.9400545 0.92168675]
mean value: 0.9347741213700584
key: test_precision
value: [0.95 0.90697674 0.76470588 0.88095238 0.85365854 0.96875
0.85 0.875 0.86666667 0.91891892]
mean value: 0.883562912966232
key: train_precision
value: [0.95086705 0.85221675 0.83614458 0.8778626 0.97005988 0.95757576
0.95043732 0.94166667 0.89378238 0.96835443]
mean value: 0.9198967410590332
key: test_recall
value: [0.97435897 1. 1. 0.94871795 0.92105263 0.81578947
0.89473684 0.8974359 1. 0.87179487]
mean value: 0.9323886639676113
key: train_recall
value: [0.9454023 0.99425287 0.99712644 0.99137931 0.92836676 0.90544413
0.93409742 0.97413793 0.99137931 0.87931034]
mean value: 0.9540896815202714
key: test_roc_auc
value: [0.96153846 0.94871795 0.84615385 0.91025641 0.88360324 0.89507422
0.87044534 0.88292848 0.92105263 0.89642375]
mean value: 0.9016194331983806
key: train_roc_auc
value: [0.94827586 0.91091954 0.90086207 0.92672414 0.94981556 0.93260712
0.94262342 0.95698301 0.9369504 0.92532852]
mean value: 0.933108964858545
key: test_jcc
value: [0.92682927 0.90697674 0.76470588 0.84090909 0.79545455 0.79487179
0.77272727 0.79545455 0.86666667 0.80952381]
mean value: 0.8274119620439396
key: train_jcc
value: [0.90136986 0.84803922 0.83413462 0.87121212 0.90250696 0.87052342
0.89071038 0.91869919 0.88688946 0.8547486 ]
mean value: 0.87788338280747
MCC on Blind test: 0.02
Accuracy on Blind test: 0.84
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.36209941 0.3485167 0.34469366 0.34386063 0.34843493 0.34590435
0.35031319 0.34137678 0.34579945 0.34589982]
mean value: 0.34768989086151125
key: score_time
value: [0.01736689 0.01688218 0.01797414 0.01745462 0.01741743 0.01712441
0.01710105 0.01725507 0.01680875 0.01733255]
mean value: 0.017271709442138673
key: test_mcc
value: [0.89861829 0.9258201 0.82078268 0.92338052 0.94935876 0.94935876
0.89736685 0.92240216 0.8972297 0.94804318]
mean value: 0.9132360999932175
key: train_mcc
value: [1. 0.9913834 0.99713055 0.9913834 0.99427743 1.
1. 1. 0.99713467 1. ]
mean value: 0.9971309464663967
key: test_accuracy
value: [0.94871795 0.96153846 0.91025641 0.96153846 0.97402597 0.97402597
0.94805195 0.96103896 0.94805195 0.97402597]
mean value: 0.9561272061272061
key: train_accuracy
value: [1. 0.99568966 0.99856322 0.99568966 0.99713056 1.
1. 1. 0.99856528 1. ]
mean value: 0.9985638368046966
key: test_fscore
value: [0.95 0.96 0.91139241 0.96202532 0.97435897 0.97435897
0.94871795 0.96103896 0.95 0.97435897]
mean value: 0.956625155435282
key: train_fscore
value: [1. 0.99568345 0.99856115 0.99569584 0.99714286 1.
1. 1. 0.99856528 1. ]
mean value: 0.9985648580541183
key: test_precision
value: [0.92682927 1. 0.9 0.95 0.95 0.95
0.925 0.97368421 0.92682927 0.97435897]
mean value: 0.9476701721470656
key: train_precision
value: [1. 0.99711816 1. 0.99426934 0.99430199 1.
1. 1. 0.99713467 1. ]
mean value: 0.9982824161382908
key: test_recall
value: [0.97435897 0.92307692 0.92307692 0.97435897 1. 1.
0.97368421 0.94871795 0.97435897 0.97435897]
mean value: 0.9665991902834008
key: train_recall
value: [1. 0.99425287 0.99712644 0.99712644 1. 1.
1. 1. 1. 1. ]
mean value: 0.9988505747126437
key: test_roc_auc
value: [0.94871795 0.96153846 0.91025641 0.96153846 0.97435897 0.97435897
0.94838057 0.96120108 0.9477058 0.97402159]
mean value: 0.9562078272604588
key: train_roc_auc
value: [1. 0.99568966 0.99856322 0.99568966 0.99712644 1.
1. 1. 0.99856734 1. ]
mean value: 0.9985636300760794
key: test_jcc
value: [0.9047619 0.92307692 0.8372093 0.92682927 0.95 0.95
0.90243902 0.925 0.9047619 0.95 ]
mean value: 0.9174078327609241
key: train_jcc
value: [1. 0.99140401 0.99712644 0.99142857 0.99430199 1.
1. 1. 0.99713467 1. ]
mean value: 0.9971395684460599
MCC on Blind test: 0.47
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.13541222 0.11654258 0.12940359 0.11901665 0.12512541 0.09526443
0.11954427 0.13520956 0.1178658 0.13506413]
mean value: 0.12284486293792725
key: score_time
value: [0.02665353 0.02460337 0.02405524 0.02078938 0.01871562 0.03112507
0.03780961 0.01863146 0.03182697 0.02348733]
mean value: 0.025769758224487304
key: test_mcc
value: [0.87438565 0.92338052 0.92338052 0.92338052 0.89608637 0.92234997
0.79217274 0.92234997 0.90083601 0.94804318]
mean value: 0.9026365447506197
key: train_mcc
value: [0.9913834 1. 0.98850575 1. 0.99139578 0.99713465
0.99713465 0.99713467 0.99713467 0.99427743]
mean value: 0.9954100996803469
key: test_accuracy
value: [0.93589744 0.96153846 0.96153846 0.96153846 0.94805195 0.96103896
0.8961039 0.96103896 0.94805195 0.97402597]
mean value: 0.9508824508824509
key: train_accuracy
value: [0.99568966 1. 0.99425287 1. 0.99569584 0.99856528
0.99856528 0.99856528 0.99856528 0.99713056]
mean value: 0.9977030046669635
key: test_fscore
value: [0.9382716 0.96103896 0.96103896 0.96103896 0.94736842 0.96
0.89473684 0.96202532 0.95121951 0.97435897]
mean value: 0.9511097554222843
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
train_fscore
value: [0.99568345 1. 0.99425287 1. 0.99569584 0.99856938
0.99856938 0.99856528 0.99856528 0.99711816]
mean value: 0.9977019650943407
key: test_precision
value: [0.9047619 0.97368421 0.97368421 0.97368421 0.94736842 0.97297297
0.89473684 0.95 0.90697674 0.97435897]
mean value: 0.9472228491016741
key: train_precision
value: [0.99711816 1. 0.99425287 1. 0.99712644 0.99714286
0.99714286 0.99713467 0.99713467 1. ]
mean value: 0.997705252122435
key: test_recall
value: [0.97435897 0.94871795 0.94871795 0.94871795 0.94736842 0.94736842
0.89473684 0.97435897 1. 0.97435897]
mean value: 0.9558704453441296
key: train_recall
value: [0.99425287 1. 0.99425287 1. 0.99426934 1.
1. 1. 1. 0.99425287]
mean value: 0.9977027961663867
key: test_roc_auc
value: [0.93589744 0.96153846 0.96153846 0.96153846 0.94804318 0.9608637
0.89608637 0.9608637 0.94736842 0.97402159]
mean value: 0.9507759784075573
key: train_roc_auc
value: [0.99568966 1. 0.99425287 1. 0.99569789 0.99856322
0.99856322 0.99856734 0.99856734 0.99712644]
mean value: 0.9977027961663867
key: test_jcc
value: [0.88372093 0.925 0.925 0.925 0.9 0.92307692
0.80952381 0.92682927 0.90697674 0.95 ]
mean value: 0.907512767531202
key: train_jcc
value: [0.99140401 1. 0.98857143 1. 0.99142857 0.99714286
0.99714286 0.99713467 0.99713467 0.99425287]
mean value: 0.9954211940284463
MCC on Blind test: 0.66
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.40567374 0.40142202 0.42740512 0.39689159 0.42887521 0.41651106
0.4120388 0.36214209 0.40737271 0.30246305]
mean value: 0.39607954025268555
key: score_time
value: [0.04091024 0.03353906 0.03123403 0.03096223 0.03827477 0.03095818
0.03075123 0.03059983 0.03166389 0.01801181]
mean value: 0.03169052600860596
key: test_mcc
value: [0.87904907 0.8273403 0.8229512 0.68516016 0.7573619 0.82082657
0.79675455 0.74887955 0.87263594 0.90083601]
mean value: 0.8111795254849189
key: train_mcc
value: [0.97128041 0.96553319 0.97414195 0.97414195 0.96556664 0.96843995
0.9713213 0.97132177 0.96558283 0.96844021]
mean value: 0.9695770189786832
key: test_accuracy
value: [0.93589744 0.91025641 0.91025641 0.83333333 0.87012987 0.90909091
0.8961039 0.87012987 0.93506494 0.94805195]
mean value: 0.9018315018315018
key: train_accuracy
value: [0.98563218 0.98275862 0.98706897 0.98706897 0.98278336 0.98421808
0.9856528 0.9856528 0.98278336 0.98421808]
mean value: 0.9847837200481538
key: test_fscore
value: [0.93975904 0.91566265 0.91358025 0.85057471 0.88095238 0.91139241
0.9 0.88095238 0.9382716 0.95121951]
mean value: 0.9082364930405693
key: train_fscore
value: [0.98567335 0.98280802 0.98708752 0.98705036 0.98280802 0.98426323
0.98571429 0.98567335 0.98280802 0.98421808]
mean value: 0.9848104247664651
key: test_precision
value: [0.88636364 0.86363636 0.88095238 0.77083333 0.80434783 0.87804878
0.85714286 0.82222222 0.9047619 0.90697674]
mean value: 0.8575286049173506
key: train_precision
value: [0.98285714 0.98 0.98567335 0.98847262 0.98280802 0.98285714
0.98290598 0.98285714 0.98 0.98280802]
mean value: 0.98312394322366
key: test_recall
value: [1. 0.97435897 0.94871795 0.94871795 0.97368421 0.94736842
0.94736842 0.94871795 0.97435897 1. ]
mean value: 0.9663292847503373
key: train_recall
value: [0.98850575 0.98563218 0.98850575 0.98563218 0.98280802 0.98567335
0.98853868 0.98850575 0.98563218 0.98563218]
mean value: 0.9865066034318084
key: test_roc_auc
value: [0.93589744 0.91025641 0.91025641 0.83333333 0.87145749 0.90958165
0.89676113 0.86909582 0.93454791 0.94736842]
mean value: 0.9018556005398111
key: train_roc_auc
value: [0.98563218 0.98275862 0.98706897 0.98706897 0.98278332 0.98421599
0.98564865 0.98565689 0.98278744 0.9842201 ]
mean value: 0.9847841122418733
key: test_jcc
value: [0.88636364 0.84444444 0.84090909 0.74 0.78723404 0.8372093
0.81818182 0.78723404 0.88372093 0.90697674]
mean value: 0.8332274051749559
key: train_jcc
value: [0.97175141 0.96619718 0.97450425 0.97443182 0.96619718 0.96901408
0.97183099 0.97175141 0.96619718 0.96892655]
mean value: 0.9700802065722987
MCC on Blind test: -0.11
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.44647932 1.44331503 1.43145061 1.42986917 1.44331455 1.44381595
1.44349027 1.42879438 1.43657732 1.44222641]
mean value: 1.438933300971985
key: score_time
value: [0.00976348 0.00972748 0.01002979 0.00950289 0.00969648 0.01002026
0.00948477 0.00985718 0.0096941 0.00976896]
mean value: 0.009754538536071777
key: test_mcc
value: [0.9258201 0.92338052 0.89861829 0.89861829 0.84412955 0.87044534
0.79338303 0.8972297 0.94929201 0.89608637]
mean value: 0.8897003212539384
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96153846 0.96153846 0.94871795 0.94871795 0.92207792 0.93506494
0.8961039 0.94805195 0.97402597 0.94805195]
mean value: 0.9443889443889444
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.96103896 0.95 0.95 0.92105263 0.93506494
0.8974359 0.95 0.975 0.94871795]
mean value: 0.9451273336799653
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92857143 0.97368421 0.92682927 0.92682927 0.92105263 0.92307692
0.875 0.92682927 0.95121951 0.94871795]
mean value: 0.9301810459544735
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94871795 0.97435897 0.97435897 0.92105263 0.94736842
0.92105263 0.97435897 1. 0.94871795]
mean value: 0.9609986504723347
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.96153846 0.94871795 0.94871795 0.92206478 0.93522267
0.89642375 0.9477058 0.97368421 0.94804318]
mean value: 0.944365721997301
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.925 0.9047619 0.9047619 0.85365854 0.87804878
0.81395349 0.9047619 0.95121951 0.90243902]
mean value: 0.8967176484887772
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.68
Accuracy on Blind test: 0.95
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.09725833 0.04872203 0.05128312 0.04099345 0.06098986 0.04159093
0.07360768 0.04642272 0.04790711 0.0478189 ]
mean value: 0.05565941333770752
key: score_time
value: [0.02518058 0.02636743 0.02171898 0.01350784 0.0175736 0.01438928
0.02206063 0.01524734 0.01386595 0.01495075]
mean value: 0.01848623752593994
key: test_mcc
value: [1. 1. 1. 1. 0.97435897 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 1. 0.98701299 1.
1. 1. 1. 1. ]
mean value: 0.9987012987012986
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 1. 0.98701299 1.
1. 1. 1. 1. ]
mean value: 0.9987012987012986
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 0.97435897 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 1. 0.98717949 1.
1. 1. 1. 1. ]
mean value: 0.9987179487179487
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 1. 0.97435897 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.08461642 0.03076625 0.04633045 0.04000735 0.03775907 0.02846503
0.01780295 0.05053997 0.03975487 0.04259777]
mean value: 0.041864013671875
key: score_time
value: [0.01950359 0.0175066 0.02805901 0.03166962 0.02384281 0.03115273
0.01609993 0.0209837 0.01979637 0.02266264]
mean value: 0.02312769889831543
key: test_mcc
value: [0.80444891 0.8720816 0.66864785 0.87438565 0.848923 0.84537494
0.8023596 0.84516739 0.87734648 0.82046748]
mean value: 0.8259202902456069
key: train_mcc
value: [0.8889924 0.88262495 0.88837495 0.89430238 0.90001331 0.8772178
0.89148436 0.90091695 0.89707354 0.89790589]
mean value: 0.8918906541046712
key: test_accuracy
value: [0.8974359 0.93589744 0.83333333 0.93589744 0.92207792 0.92207792
0.8961039 0.92207792 0.93506494 0.90909091]
mean value: 0.9109057609057609
key: train_accuracy
value: [0.94396552 0.94109195 0.94396552 0.94683908 0.94978479 0.93830703
0.94548063 0.94978479 0.94835007 0.94835007]
mean value: 0.94559194577747
key: test_fscore
value: [0.9047619 0.93670886 0.82666667 0.9382716 0.925 0.92307692
0.90243902 0.925 0.93975904 0.91358025]
mean value: 0.9135264267651663
key: train_fscore
value: [0.94530154 0.94200849 0.94483734 0.94781382 0.9506347 0.9395218
0.94647887 0.95104895 0.9490085 0.94957983]
mean value: 0.9466233844343812
key: test_precision
value: [0.84444444 0.925 0.86111111 0.9047619 0.88095238 0.9
0.84090909 0.90243902 0.88636364 0.88095238]
mean value: 0.8826933973885194
key: train_precision
value: [0.92328767 0.9275766 0.93036212 0.93074792 0.93611111 0.92265193
0.93074792 0.92643052 0.93575419 0.92622951]
mean value: 0.9289899495435971
key: test_recall
value: [0.97435897 0.94871795 0.79487179 0.97435897 0.97368421 0.94736842
0.97368421 0.94871795 1. 0.94871795]
mean value: 0.9484480431848853
key: train_recall
value: [0.9683908 0.95689655 0.95977011 0.96551724 0.96561605 0.95702006
0.96275072 0.97701149 0.96264368 0.97413793]
mean value: 0.9649754635576194
key: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
test_roc_auc
value: [0.8974359 0.93589744 0.83333333 0.93589744 0.92273954 0.92240216
0.89709852 0.9217274 0.93421053 0.9085695 ]
mean value: 0.9109311740890689
key: train_roc_auc
value: [0.94396552 0.94109195 0.94396552 0.94683908 0.94976205 0.93828014
0.94545582 0.9498238 0.94837055 0.94838702]
mean value: 0.9455941441886506
key: test_jcc
value: [0.82608696 0.88095238 0.70454545 0.88372093 0.86046512 0.85714286
0.82222222 0.86046512 0.88636364 0.84090909]
mean value: 0.8422873761448079
key: train_jcc
value: [0.8962766 0.89037433 0.89544236 0.90080429 0.90591398 0.88594164
0.89839572 0.90666667 0.90296496 0.904 ]
mean value: 0.898678054730654
MCC on Blind test: 0.17
Accuracy on Blind test: 0.83
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.312994 0.38648391 0.23989654 0.30303741 0.37055206 0.34855318
0.34028673 0.28656173 0.21234965 0.28178763]
mean value: 0.30825028419494627
key: score_time
value: [0.02631187 0.0196352 0.01235032 0.02396035 0.01913309 0.01957345
0.01945543 0.01943469 0.01255512 0.01918983]
mean value: 0.019159936904907228
key: test_mcc
value: [0.85063978 0.84615385 0.74456944 0.87438565 0.87773765 0.87044534
0.8023596 0.87035806 0.87734648 0.77236485]
mean value: 0.838636069915691
key: train_mcc
value: [0.91453346 0.90828601 0.91110388 0.92272242 0.92593985 0.92337598
0.89148436 0.92614622 0.91688214 0.92317155]
mean value: 0.9163645870629656
key: test_accuracy
value: [0.92307692 0.92307692 0.87179487 0.93589744 0.93506494 0.93506494
0.8961039 0.93506494 0.93506494 0.88311688]
mean value: 0.9173326673326674
key: train_accuracy
value: [0.95689655 0.95402299 0.95545977 0.9612069 0.96269727 0.96126255
0.94548063 0.96269727 0.95839311 0.96126255]
mean value: 0.9579379607183496
key: test_fscore
value: [0.92682927 0.92307692 0.86842105 0.9382716 0.9382716 0.93506494
0.90243902 0.93670886 0.93975904 0.89156627]
mean value: 0.9200408575297221
key: train_fscore
value: [0.95774648 0.95454545 0.95590327 0.96170213 0.96338028 0.96213184
0.94647887 0.96338028 0.95863053 0.96191819]
mean value: 0.9585817329155627
key: test_precision
value: [0.88372093 0.92307692 0.89189189 0.9047619 0.88372093 0.92307692
0.84090909 0.925 0.88636364 0.84090909]
mean value: 0.8903431321454578
key: train_precision
value: [0.93922652 0.94382022 0.94647887 0.94957983 0.94736842 0.94230769
0.93074792 0.94475138 0.95184136 0.94459834]
mean value: 0.9440720563965304
key: test_recall
value: [0.97435897 0.92307692 0.84615385 0.97435897 1. 0.94736842
0.97368421 0.94871795 1. 0.94871795]
mean value: 0.9536437246963563
key: train_recall
value: [0.97701149 0.96551724 0.96551724 0.97413793 0.97994269 0.98280802
0.96275072 0.98275862 0.96551724 0.97988506]
mean value: 0.9735846260250963
key: test_roc_auc
value: [0.92307692 0.92307692 0.87179487 0.93589744 0.93589744 0.93522267
0.89709852 0.93488529 0.93421053 0.88225371]
mean value: 0.9173414304993253
key: train_roc_auc
value: [0.95689655 0.95402299 0.95545977 0.9612069 0.9626725 0.9612316
0.94545582 0.96272602 0.95840332 0.96128923]
mean value: 0.9579364687283866
key: test_jcc
value: [0.86363636 0.85714286 0.76744186 0.88372093 0.88372093 0.87804878
0.82222222 0.88095238 0.88636364 0.80434783]
mean value: 0.8527597787822454
key: train_jcc
value: [0.91891892 0.91304348 0.91553134 0.92622951 0.92934783 0.92702703
0.89839572 0.92934783 0.92054795 0.92663043]
mean value: 0.9205020021640535
MCC on Blind test: 0.18
Accuracy on Blind test: 0.84
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04660821 0.04108977 0.04000759 0.04018307 0.04194832 0.05612898
0.06814337 0.06369376 0.04931545 0.04017711]
mean value: 0.048729562759399415
key: score_time
value: [0.01239371 0.01467419 0.01500845 0.03209901 0.01508331 0.02363563
0.02443552 0.0149827 0.01499224 0.015939 ]
mean value: 0.01832437515258789
key: test_mcc
value: [0.76948376 0.7200823 0.78192905 0.84726867 0.81032908 0.79338303
0.78862619 0.79621403 0.68825911 0.77236485]
mean value: 0.7767940071372722
key: train_mcc
value: [0.83297064 0.81419045 0.874631 0.84379896 0.86902359 0.82167505
0.87432528 0.84403521 0.82833288 0.84645747]
mean value: 0.8449440529313653
key: test_accuracy
value: [0.88461538 0.85897436 0.88461538 0.92307692 0.8961039 0.8961039
0.88311688 0.8961039 0.84415584 0.88311688]
mean value: 0.884998334998335
key: train_accuracy
value: [0.91522989 0.9066092 0.93534483 0.92097701 0.93256815 0.90961263
0.93543759 0.92109039 0.91391679 0.92252511]
mean value: 0.9213311565164334
key: test_fscore
value: [0.88607595 0.85333333 0.89411765 0.925 0.9047619 0.8974359
0.89411765 0.90243902 0.84615385 0.89156627]
mean value: 0.8895001514620202
key: train_fscore
value: [0.91839557 0.9088359 0.9382716 0.92350487 0.93570451 0.91310345
0.9382716 0.92350487 0.91525424 0.92458101]
mean value: 0.9239427629760671
key: test_precision
value: [0.875 0.88888889 0.82608696 0.90243902 0.82608696 0.875
0.80851064 0.86046512 0.84615385 0.84090909]
mean value: 0.8549540517962491
key: train_precision
value: [0.88533333 0.88767123 0.8976378 0.89487871 0.89528796 0.88031915
0.9 0.89487871 0.9 0.89945652]
mean value: 0.8935463402675042
key: test_recall
value: [0.8974359 0.82051282 0.97435897 0.94871795 1. 0.92105263
1. 0.94871795 0.84615385 0.94871795]
mean value: 0.9305668016194332
key: train_recall
value: [0.95402299 0.93103448 0.98275862 0.95402299 0.97994269 0.94842407
0.97994269 0.95402299 0.93103448 0.95114943]
mean value: 0.9566355432598886
key: test_roc_auc
value: [0.88461538 0.85897436 0.88461538 0.92307692 0.8974359 0.89642375
0.88461538 0.89541161 0.84412955 0.88225371]
mean value: 0.8851551956815115
key: train_roc_auc
value: [0.91522989 0.9066092 0.93534483 0.92097701 0.93250008 0.90955686
0.93537365 0.92113757 0.91394131 0.92256612]
mean value: 0.9213236504956691
key: test_jcc
value: [0.79545455 0.74418605 0.80851064 0.86046512 0.82608696 0.81395349
0.80851064 0.82222222 0.73333333 0.80434783]
mean value: 0.8017070811377331
key: train_jcc
value: [0.84910486 0.83290488 0.88372093 0.85788114 0.87917738 0.84010152
0.88372093 0.85788114 0.84375 0.85974026]
mean value: 0.858798303849566
MCC on Blind test: 0.04
Accuracy on Blind test: 0.86
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.93905425 1.17588329 1.03585625 1.12192154 1.06386805 1.21680045
1.04671884 1.19528508 1.25738096 0.99928236]
mean value: 1.1052051067352295
key: score_time
value: [0.01504755 0.01799703 0.01554322 0.01557112 0.01565742 0.01596832
0.01598644 0.01293421 0.01796269 0.0162077 ]
mean value: 0.01588757038116455
key: test_mcc
value: [0.87904907 0.82078268 0.83405766 0.87904907 0.83239263 0.97435897
0.85485041 0.87734648 0.94929201 0.83165353]
mean value: 0.8732832527458516
key: train_mcc
value: [0.98008335 0.96842679 0.97726976 0.9577867 0.98011104 0.97730122
0.96060095 0.9914285 0.97450093 0.99427753]
mean value: 0.9761786783462836
key: test_accuracy
value: [0.93589744 0.91025641 0.91025641 0.93589744 0.90909091 0.98701299
0.92207792 0.93506494 0.97402597 0.90909091]
mean value: 0.9328671328671329
key: train_accuracy
value: [0.98994253 0.9841954 0.98850575 0.97844828 0.98995696 0.98852224
0.97991392 0.99569584 0.98708752 0.99713056]
mean value: 0.9879398984152113
key: test_fscore
value: [0.93975904 0.91139241 0.91764706 0.93975904 0.91566265 0.98701299
0.92682927 0.93975904 0.975 0.91764706]
mean value: 0.9370468537052165
key: train_fscore
value: [0.99004267 0.98426323 0.98863636 0.97890295 0.99007092 0.98866856
0.98033708 0.99570815 0.98723404 0.99713467]
mean value: 0.9880998648091364
key: test_precision
value: [0.88636364 0.9 0.84782609 0.88636364 0.84444444 0.97435897
0.86363636 0.88636364 0.95121951 0.84782609]
mean value: 0.8888402377638857
key: train_precision
value: [0.98028169 0.98005698 0.97752809 0.95867769 0.98033708 0.97759104
0.96143251 0.99145299 0.97478992 0.99428571]
mean value: 0.9776433689694275
key: test_recall
value: [1. 0.92307692 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9923076923076923
key: train_recall
value: [1. 0.98850575 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9988505747126437
key: test_roc_auc
value: [0.93589744 0.91025641 0.91025641 0.93589744 0.91025641 0.98717949
0.92307692 0.93421053 0.97368421 0.90789474]
mean value: 0.9328609986504723
key: train_roc_auc
value: [0.98994253 0.9841954 0.98850575 0.97844828 0.98994253 0.98850575
0.97988506 0.99570201 0.98710602 0.99713467]
mean value: 0.9879367980766064
key: test_jcc
value: [0.88636364 0.8372093 0.84782609 0.88636364 0.84444444 0.97435897
0.86363636 0.88636364 0.95121951 0.84782609]
mean value: 0.8825611679964438
key: train_jcc
value: [0.98028169 0.96901408 0.97752809 0.95867769 0.98033708 0.97759104
0.96143251 0.99145299 0.97478992 0.99428571]
mean value: 0.9765390794144337
MCC on Blind test: 0.07
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01685691 0.01186967 0.01094079 0.01095653 0.01096821 0.01078248
0.01139474 0.01106596 0.01148486 0.01124048]
mean value: 0.011756062507629395
key: score_time
value: [0.01304412 0.00962949 0.00939202 0.00912499 0.00924325 0.00911808
0.00993681 0.00923181 0.00971222 0.0098443 ]
mean value: 0.009827709197998047
key: test_mcc
value: [0.20907169 0.34188173 0.20907169 0.13736056 0.38420817 0.19016685
0.40016667 0.21209492 0.15806527 0.12617326]
mean value: 0.23682608181306536
key: train_mcc
value: [0.27273302 0.26344424 0.26906224 0.2987076 0.2941979 0.24981071
0.31330607 0.26698682 0.28084872 0.25639571]
mean value: 0.27654930205183825
key: test_accuracy
value: [0.58974359 0.65384615 0.58974359 0.56410256 0.66233766 0.58441558
0.66233766 0.5974026 0.57142857 0.55844156]
mean value: 0.6033799533799533
key: train_accuracy
value: [0.62068966 0.61637931 0.61781609 0.63218391 0.62840746 0.61262554
0.6312769 0.6169297 0.6241033 0.61549498]
mean value: 0.6215906842131301
key: test_fscore
value: [0.67346939 0.71578947 0.67346939 0.63043478 0.72916667 0.65217391
0.73469388 0.67368421 0.65979381 0.64583333]
mean value: 0.6788508847356914
key: train_fscore
value: [0.69230769 0.68917346 0.69212963 0.7016317 0.70195627 0.68235294
0.71091114 0.69061414 0.69463869 0.68321513]
mean value: 0.6938930791333401
key: test_precision
value: [0.55932203 0.60714286 0.55932203 0.54716981 0.60344828 0.55555556
0.6 0.57142857 0.55172414 0.54385965]
mean value: 0.569897292616026
key: train_precision
value: [0.58235294 0.57925636 0.57945736 0.59019608 0.58653846 0.57884232
0.58518519 0.57864078 0.58431373 0.58032129]
mean value: 0.5825104493449902
key: test_recall
value: [0.84615385 0.87179487 0.84615385 0.74358974 0.92105263 0.78947368
0.94736842 0.82051282 0.82051282 0.79487179]
mean value: 0.8401484480431849
key: train_recall
value: [0.85344828 0.85057471 0.8591954 0.86494253 0.8739255 0.83094556
0.90544413 0.85632184 0.85632184 0.83045977]
mean value: 0.8581579554062511
key: test_roc_auc
value: [0.58974359 0.65384615 0.58974359 0.56410256 0.66565452 0.58704453
0.6659919 0.59446694 0.56815115 0.55533063]
mean value: 0.6034075573549258
key: train_roc_auc
value: [0.62068966 0.61637931 0.61781609 0.63218391 0.6280547 0.61231186
0.63088298 0.61727267 0.62443599 0.61580295]
mean value: 0.621583012218819
key: test_jcc
value: [0.50769231 0.55737705 0.50769231 0.46031746 0.57377049 0.48387097
0.58064516 0.50793651 0.49230769 0.47692308]
mean value: 0.5148533022885218
key: train_jcc
value: [0.52941176 0.52575488 0.52920354 0.54039497 0.54078014 0.51785714
0.55148342 0.52743363 0.53214286 0.51885099]
mean value: 0.5313313340334578
MCC on Blind test: 0.13
Accuracy on Blind test: 0.48
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01130867 0.01133299 0.01136398 0.011338 0.01160789 0.0114572
0.01139069 0.01157093 0.01149297 0.0114553 ]
mean value: 0.01143186092376709
key: score_time
value: [0.00945854 0.00934172 0.00938725 0.00936246 0.00979757 0.00939035
0.0100863 0.00934434 0.0093348 0.00937557]
mean value: 0.00948789119720459
key: test_mcc
value: [0.41367015 0.30779351 0.33333333 0.41367015 0.57555876 0.3053019
0.63817566 0.57300197 0.4300371 0.43651151]
mean value: 0.4427054043078278
key: train_mcc
value: [0.47815242 0.49609973 0.40554933 0.46831764 0.50137939 0.47349208
0.52150427 0.53083153 0.5516137 0.49598651]
mean value: 0.49229265800871064
key: test_accuracy
value: [0.70512821 0.65384615 0.66666667 0.70512821 0.77922078 0.64935065
0.80519481 0.77922078 0.71428571 0.71428571]
mean value: 0.7172327672327673
key: train_accuracy
value: [0.73706897 0.74712644 0.70258621 0.73275862 0.74748924 0.73601148
0.75609756 0.76040172 0.77044476 0.74605452]
mean value: 0.7436039512524942
key: test_fscore
value: [0.72289157 0.64935065 0.66666667 0.68493151 0.8 0.6746988
0.82758621 0.8045977 0.73170732 0.74418605]
mean value: 0.730661645594319
key: train_fscore
value: [0.75303644 0.75757576 0.70886076 0.74659401 0.76657825 0.74655647
0.77748691 0.78112713 0.79057592 0.76048714]
mean value: 0.7588878784698425
key: test_precision
value: [0.68181818 0.65789474 0.66666667 0.73529412 0.72340426 0.62222222
0.73469388 0.72916667 0.69767442 0.68085106]
mean value: 0.6929686207167509
key: train_precision
value: [0.70992366 0.72751323 0.69421488 0.70984456 0.71358025 0.71883289
0.71566265 0.71807229 0.72596154 0.71867008]
mean value: 0.7152276020361097
key: test_recall
value: [0.76923077 0.64102564 0.66666667 0.64102564 0.89473684 0.73684211
0.94736842 0.8974359 0.76923077 0.82051282]
mean value: 0.7784075573549258
key: train_recall
value: [0.80172414 0.79022989 0.72413793 0.78735632 0.82808023 0.7765043
0.85100287 0.85632184 0.86781609 0.80747126]
mean value: 0.8090644863814511
key: test_roc_auc
value: [0.70512821 0.65384615 0.66666667 0.70512821 0.78070175 0.65047233
0.80701754 0.77766532 0.71356275 0.71288799]
mean value: 0.7173076923076923
key: train_roc_auc
value: [0.73706897 0.74712644 0.70258621 0.73275862 0.74737345 0.7359533
0.7559612 0.76053914 0.77058426 0.74614251]
mean value: 0.7436094094786417
key: test_jcc
value: [0.56603774 0.48076923 0.5 0.52083333 0.66666667 0.50909091
0.70588235 0.67307692 0.57692308 0.59259259]
mean value: 0.5791872821242966
key: train_jcc
value: [0.6038961 0.6097561 0.54901961 0.59565217 0.62150538 0.5956044
0.6359743 0.64086022 0.65367965 0.61353712]
mean value: 0.6119485045867612
MCC on Blind test: -0.05
Accuracy on Blind test: 0.66
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01408839 0.01166725 0.0116787 0.01172638 0.0114615 0.0116322
0.0116086 0.01195526 0.01171875 0.01183867]
mean value: 0.011937570571899415
key: score_time
value: [0.04168391 0.01552224 0.01665449 0.01439142 0.01912832 0.01864362
0.01753926 0.01436853 0.01438332 0.01468682]
mean value: 0.018700194358825684
key: test_mcc
value: [0.74833148 0.79056942 0.79056942 0.79056942 0.76725173 0.76725173
0.83239263 0.70243936 0.70284579 0.80937951]
mean value: 0.7701600467317964
key: train_mcc
value: [0.83025321 0.82778622 0.84515425 0.83272484 0.83786126 0.82060485
0.84283378 0.83298852 0.82628251 0.82560052]
mean value: 0.8322089953845072
key: test_accuracy
value: [0.85897436 0.88461538 0.88461538 0.88461538 0.87012987 0.87012987
0.90909091 0.83116883 0.84415584 0.8961039 ]
mean value: 0.8733599733599733
key: train_accuracy
value: [0.90804598 0.9066092 0.91666667 0.90948276 0.91248207 0.90243902
0.91535151 0.90961263 0.90674319 0.90530846]
mean value: 0.9092741470011049
key: test_fscore
value: [0.87640449 0.89655172 0.89655172 0.89655172 0.88372093 0.88372093
0.91566265 0.85714286 0.86046512 0.90697674]
mean value: 0.8873748895471315
key: train_fscore
value: [0.91578947 0.91458607 0.92307692 0.91699605 0.91963109 0.91122715
0.92206077 0.91699605 0.91413474 0.91338583]
mean value: 0.9167884145531371
key: test_precision
value: [0.78 0.8125 0.8125 0.8125 0.79166667 0.79166667
0.84444444 0.75 0.78723404 0.82978723]
mean value: 0.8012299054373523
key: train_precision
value: [0.84466019 0.84261501 0.85714286 0.84671533 0.85121951 0.83693046
0.85539216 0.84671533 0.84596577 0.84057971]
mean value: 0.8467936325367894
key: test_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.94871795 1. ]
mean value: 0.9948717948717949
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99425287 1. ]
mean value: 0.9994252873563219
key: test_roc_auc
value: [0.85897436 0.88461538 0.88461538 0.88461538 0.87179487 0.87179487
0.91025641 0.82894737 0.84278003 0.89473684]
mean value: 0.8733130904183536
key: train_roc_auc
value: [0.90804598 0.9066092 0.91666667 0.90948276 0.91235632 0.90229885
0.91522989 0.90974212 0.90686856 0.90544413]
mean value: 0.9092744458716201
key: test_jcc
value: [0.78 0.8125 0.8125 0.8125 0.79166667 0.79166667
0.84444444 0.75 0.75510204 0.82978723]
mean value: 0.7980167052636657
key: train_jcc
value: [0.84466019 0.84261501 0.85714286 0.84671533 0.85121951 0.83693046
0.85539216 0.84671533 0.84184915 0.84057971]
mean value: 0.8463819703615236
MCC on Blind test: -0.02
Accuracy on Blind test: 0.8
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03765655 0.03760839 0.03512573 0.03914857 0.03210878 0.03907871
0.03898215 0.0320704 0.03831744 0.03638983]
mean value: 0.03664865493774414
key: score_time
value: [0.01612902 0.01733565 0.0167191 0.01747584 0.01575947 0.01732755
0.01739788 0.0156641 0.0170989 0.01659179]
mean value: 0.016749930381774903
key: test_mcc
value: [0.8273403 0.77151675 0.672214 0.87904907 0.85485041 0.82542666
0.83239263 0.80937951 0.8542977 0.92480439]
mean value: 0.8251271421387778
key: train_mcc
value: [0.88526309 0.88617728 0.87500947 0.89616338 0.9093827 0.86688517
0.8885052 0.90677702 0.8911284 0.89372164]
mean value: 0.8899013347491985
key: test_accuracy
value: [0.91025641 0.88461538 0.83333333 0.93589744 0.92207792 0.90909091
0.90909091 0.8961039 0.92207792 0.96103896]
mean value: 0.9083583083583083
key: train_accuracy
value: [0.94109195 0.94109195 0.93678161 0.9454023 0.95265423 0.93113343
0.94117647 0.95121951 0.94261119 0.94404591]
mean value: 0.9427208562146473
key: test_fscore
value: [0.91566265 0.88888889 0.84337349 0.93975904 0.92682927 0.91358025
0.91566265 0.90697674 0.92857143 0.96296296]
mean value: 0.9142267371140892
key: train_fscore
value: [0.94344828 0.94375857 0.93854749 0.94822888 0.95485636 0.9346049
0.94451962 0.95342466 0.94565217 0.94693878]
mean value: 0.9453979711965944
key: test_precision
value: [0.86363636 0.85714286 0.79545455 0.88636364 0.86363636 0.86046512
0.84444444 0.82978723 0.86666667 0.92857143]
mean value: 0.8596168656237929
key: train_precision
value: [0.9071618 0.90288714 0.91304348 0.9015544 0.91361257 0.89090909
0.89487179 0.91099476 0.89690722 0.89922481]
mean value: 0.9031167063547301
key: test_recall
value: [0.97435897 0.92307692 0.8974359 1. 1. 0.97368421
1. 1. 1. 1. ]
mean value: 0.976855600539811
key: train_recall
value: [0.98275862 0.98850575 0.96551724 1. 1. 0.98280802
1. 1. 1. 1. ]
mean value: 0.9919589632118039
key: test_roc_auc
value: [0.91025641 0.88461538 0.83333333 0.93589744 0.92307692 0.90991903
0.91025641 0.89473684 0.92105263 0.96052632]
mean value: 0.9083670715249663
key: train_roc_auc
value: [0.94109195 0.94109195 0.93678161 0.9454023 0.95258621 0.93105918
0.94109195 0.9512894 0.94269341 0.94412607]
mean value: 0.9427214043408095
key: test_jcc
value: [0.84444444 0.8 0.72916667 0.88636364 0.86363636 0.84090909
0.84444444 0.82978723 0.86666667 0.92857143]
mean value: 0.8433989975745295
key: train_jcc
value: [0.89295039 0.89350649 0.88421053 0.9015544 0.91361257 0.87723785
0.89487179 0.91099476 0.89690722 0.89922481]
mean value: 0.8965070814685796
MCC on Blind test: 0.09
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.84229445 2.23112178 3.84612989 4.29770851 3.43484664 2.85751414
2.78833723 2.80571318 2.84450674 2.93947673]
mean value: 3.0887649297714233
key: score_time
value: [0.01281977 0.01278067 0.015692 0.01541686 0.01301861 0.01513886
0.01986361 0.01298094 0.0137713 0.01316261]
mean value: 0.014464521408081054
key: test_mcc
value: [0.90219371 0.94996791 0.87904907 0.90219371 0.87773765 0.97435897
0.94935876 0.92480439 0.97434188 0.92480439]
mean value: 0.9258810437173667
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94871795 0.97435897 0.93589744 0.94871795 0.93506494 0.98701299
0.97402597 0.96103896 0.98701299 0.96103896]
mean value: 0.9612887112887113
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95121951 0.975 0.93975904 0.95121951 0.9382716 0.98701299
0.97435897 0.96296296 0.98734177 0.96296296]
mean value: 0.963010932492288
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90697674 0.95121951 0.88636364 0.90697674 0.88372093 0.97435897
0.95 0.92857143 0.975 0.92857143]
mean value: 0.929175939866524
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94871795 0.97435897 0.93589744 0.94871795 0.93589744 0.98717949
0.97435897 0.96052632 0.98684211 0.96052632]
mean value: 0.9613022941970311
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90697674 0.95121951 0.88636364 0.90697674 0.88372093 0.97435897
0.95 0.92857143 0.975 0.92857143]
mean value: 0.929175939866524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.21
Accuracy on Blind test: 0.92
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04185247 0.02919173 0.02772498 0.02817893 0.02993226 0.02952623
0.03049874 0.02950644 0.02975488 0.02985263]
mean value: 0.030601930618286134
key: score_time
value: [0.01172447 0.00977373 0.00929236 0.00948191 0.0096941 0.00979233
0.00987053 0.00917792 0.00941515 0.009444 ]
mean value: 0.009766650199890137
key: test_mcc
value: [0.97467943 0.97467943 0.97467943 0.97467943 0.92495119 0.94935876
0.87773765 0.90083601 0.92480439 0.92480439]
mean value: 0.9401210117670407
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 0.98717949 0.98717949 0.98717949 0.96103896 0.97402597
0.93506494 0.94805195 0.96103896 0.96103896]
mean value: 0.9688977688977689
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 0.98734177 0.98734177 0.98734177 0.96202532 0.97435897
0.9382716 0.95121951 0.96296296 0.96296296]
mean value: 0.9701168422481585
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 0.975 0.975 0.975 0.92682927 0.95
0.88372093 0.90697674 0.92857143 0.92857143]
mean value: 0.9424669799854145
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 0.98717949 0.98717949 0.98717949 0.96153846 0.97435897
0.93589744 0.94736842 0.96052632 0.96052632]
mean value: 0.96889338731444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 0.975 0.975 0.975 0.92682927 0.95
0.88372093 0.90697674 0.92857143 0.92857143]
mean value: 0.9424669799854145
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.51
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13608575 0.13191938 0.13662219 0.13517451 0.14247322 0.13838029
0.13397717 0.13585806 0.12841868 0.13130641]
mean value: 0.13502156734466553
key: score_time
value: [0.01952577 0.01972151 0.01994419 0.01997352 0.01852083 0.02017856
0.02018285 0.02007389 0.0198772 0.02019548]
mean value: 0.01981937885284424
key: test_mcc
value: [0.97467943 1. 1. 1. 0.97435897 1.
0.97435897 1. 1. 1. ]
mean value: 0.9923397383198845
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 1. 1. 1. 0.98701299 1.
0.98701299 1. 1. 1. ]
mean value: 0.9961205461205461
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 1. 1. 1. 0.98701299 1.
0.98701299 1. 1. 1. ]
mean value: 0.9961367746177873
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 1. 1. 1. 0.97435897 1.
0.97435897 1. 1. 1. ]
mean value: 0.9923717948717948
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 1. 1. 1. 0.98717949 1.
0.98717949 1. 1. 1. ]
mean value: 0.9961538461538462
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 1. 1. 1. 0.97435897 1.
0.97435897 1. 1. 1. ]
mean value: 0.9923717948717948
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
0.92
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01124406 0.01222348 0.01148725 0.01243854 0.01266551 0.01233888
0.01323295 0.01186299 0.01137352 0.01267576]
mean value: 0.012154293060302735
key: score_time
value: [0.00932574 0.00926256 0.0101335 0.01017618 0.0101428 0.00961208
0.01254606 0.01016855 0.00980067 0.01009917]
mean value: 0.010126733779907226
key: test_mcc
value: [0.83405766 0.94996791 0.87904907 0.94996791 0.94935876 0.87773765
0.97435897 0.97434188 0.94929201 0.94929201]
mean value: 0.9287423832133312
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91025641 0.97435897 0.93589744 0.97435897 0.97402597 0.93506494
0.98701299 0.98701299 0.97402597 0.97402597]
mean value: 0.9626040626040626
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91764706 0.975 0.93975904 0.975 0.97435897 0.9382716
0.98701299 0.98734177 0.975 0.975 ]
mean value: 0.9644391433430239
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.84782609 0.95121951 0.88636364 0.95121951 0.95 0.88372093
0.97435897 0.975 0.95121951 0.95121951]
mean value: 0.9322147676692178
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91025641 0.97435897 0.93589744 0.97435897 0.97435897 0.93589744
0.98717949 0.98684211 0.97368421 0.97368421]
mean value: 0.9626518218623482
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84782609 0.95121951 0.88636364 0.95121951 0.95 0.88372093
0.97435897 0.975 0.95121951 0.95121951]
mean value: 0.9322147676692178
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.9
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.87075925 1.92578888 1.93493485 1.92512369 1.98649311 1.91488886
1.92578721 1.94692016 1.879282 1.86258054]
mean value: 1.9172558546066285
key: score_time
value: [0.09941483 0.10450387 0.10129547 0.10417533 0.10497546 0.10146046
0.10362077 0.10274792 0.099684 0.1028142 ]
mean value: 0.10246922969818115
key: test_mcc
value: [0.97467943 1. 0.97467943 1. 0.94935876 1.
0.94935876 1. 1. 0.97434188]
mean value: 0.9822418267998186
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 1. 0.98717949 1. 0.97402597 1.
0.97402597 1. 1. 0.98701299]
mean value: 0.9909423909423909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 1. 0.98734177 1. 0.97435897 1.
0.97435897 1. 1. 0.98734177]
mean value: 0.9910743265173645
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 1. 0.975 1. 0.95 1. 0.95 1. 1. 0.975]
mean value: 0.9824999999999999
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 1. 0.98717949 1. 0.97435897 1.
0.97435897 1. 1. 0.98684211]
mean value: 0.9909919028340081
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 1. 0.975 1. 0.95 1. 0.95 1. 1. 0.975]
mean value: 0.9824999999999999
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.00633883 1.0316186 1.07874227 0.98896933 1.04627848 1.08366418
1.11056924 1.04657936 1.01156902 1.00038218]
mean value: 1.0404711484909057
key: score_time
value: [0.32827139 0.18333936 0.18703556 0.17614055 0.2115941 0.20295739
0.17030215 0.1987431 0.21623302 0.15138865]
mean value: 0.20260052680969237
key: test_mcc
value: [0.94996791 1. 0.9258201 1. 0.94935876 1.
0.90109146 0.94929201 0.94929201 0.97434188]
mean value: 0.9599164131308747
key: train_mcc
value: [0.98573393 0.98008335 0.98573393 0.98290472 0.98575394 0.98292859
0.98292859 0.9773027 0.98575452 0.98575452]
mean value: 0.9834878805504522
key: test_accuracy
value: [0.97435897 1. 0.96153846 1. 0.97402597 1.
0.94805195 0.97402597 0.97402597 0.98701299]
mean value: 0.9793040293040293
key: train_accuracy
value: [0.99281609 0.98994253 0.99281609 0.99137931 0.9928264 0.99139168
0.99139168 0.98852224 0.9928264 0.9928264 ]
mean value: 0.9916738814954073
key: test_fscore
value: [0.975 1. 0.96296296 1. 0.97435897 1.
0.95 0.975 0.975 0.98734177]
mean value: 0.9799663709473836
key: train_fscore
value: [0.99286733 0.99004267 0.99286733 0.99145299 0.99288762 0.99147727
0.99147727 0.98863636 0.99286733 0.99286733]
mean value: 0.9917443528792916
key: test_precision
value: [0.95121951 1. 0.92857143 1. 0.95 1.
0.9047619 0.95121951 0.95121951 0.975 ]
mean value: 0.9611991869918699
key: train_precision
value: [0.98583569 0.98028169 0.98583569 0.98305085 0.98587571 0.98309859
0.98309859 0.97752809 0.98583569 0.98583569]
mean value: 0.983627629300336
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97435897 1. 0.96153846 1. 0.97435897 1.
0.94871795 0.97368421 0.97368421 0.98684211]
mean value: 0.9793184885290148
key: train_roc_auc
value: [0.99281609 0.98994253 0.99281609 0.99137931 0.99281609 0.99137931
0.99137931 0.98853868 0.99283668 0.99283668]
mean value: 0.9916740770016138
key: test_jcc
value: [0.95121951 1. 0.92857143 1. 0.95 1.
0.9047619 0.95121951 0.95121951 0.975 ]
mean value: 0.9611991869918699
key: train_jcc
value: [0.98583569 0.98028169 0.98583569 0.98305085 0.98587571 0.98309859
0.98309859 0.97752809 0.98583569 0.98583569]
mean value: 0.983627629300336
MCC on Blind test: 0.46
Accuracy on Blind test: 0.93
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01607227 0.01323771 0.01117516 0.0116055 0.01243186 0.01225638
0.01118016 0.01187682 0.01113033 0.01164865]
mean value: 0.012261486053466797
key: score_time
value: [0.01342487 0.00997972 0.00940561 0.01012993 0.0095017 0.0093751
0.00956488 0.01002479 0.00947022 0.00997615]
mean value: 0.010085296630859376
key: test_mcc
value: [0.41367015 0.30779351 0.33333333 0.41367015 0.57555876 0.3053019
0.63817566 0.57300197 0.4300371 0.43651151]
mean value: 0.4427054043078278
key: train_mcc
value: [0.47815242 0.49609973 0.40554933 0.46831764 0.50137939 0.47349208
0.52150427 0.53083153 0.5516137 0.49598651]
mean value: 0.49229265800871064
key: test_accuracy
value: [0.70512821 0.65384615 0.66666667 0.70512821 0.77922078 0.64935065
0.80519481 0.77922078 0.71428571 0.71428571]
mean value: 0.7172327672327673
key: train_accuracy
value: [0.73706897 0.74712644 0.70258621 0.73275862 0.74748924 0.73601148
0.75609756 0.76040172 0.77044476 0.74605452]
mean value: 0.7436039512524942
key: test_fscore
value: [0.72289157 0.64935065 0.66666667 0.68493151 0.8 0.6746988
0.82758621 0.8045977 0.73170732 0.74418605]
mean value: 0.730661645594319
key: train_fscore
value: [0.75303644 0.75757576 0.70886076 0.74659401 0.76657825 0.74655647
0.77748691 0.78112713 0.79057592 0.76048714]
mean value: 0.7588878784698425
key: test_precision
value: [0.68181818 0.65789474 0.66666667 0.73529412 0.72340426 0.62222222
0.73469388 0.72916667 0.69767442 0.68085106]
mean value: 0.6929686207167509
key: train_precision
value: [0.70992366 0.72751323 0.69421488 0.70984456 0.71358025 0.71883289
0.71566265 0.71807229 0.72596154 0.71867008]
mean value: 0.7152276020361097
key: test_recall
value: [0.76923077 0.64102564 0.66666667 0.64102564 0.89473684 0.73684211
0.94736842 0.8974359 0.76923077 0.82051282]
mean value: 0.7784075573549258
key: train_recall
value: [0.80172414 0.79022989 0.72413793 0.78735632 0.82808023 0.7765043
0.85100287 0.85632184 0.86781609 0.80747126]
mean value: 0.8090644863814511
key: test_roc_auc
value: [0.70512821 0.65384615 0.66666667 0.70512821 0.78070175 0.65047233
0.80701754 0.77766532 0.71356275 0.71288799]
mean value: 0.7173076923076923
key: train_roc_auc
value: [0.73706897 0.74712644 0.70258621 0.73275862 0.74737345 0.7359533
0.7559612 0.76053914 0.77058426 0.74614251]
mean value: 0.7436094094786417
key: test_jcc
value: [0.56603774 0.48076923 0.5 0.52083333 0.66666667 0.50909091
0.70588235 0.67307692 0.57692308 0.59259259]
mean value: 0.5791872821242966
key: train_jcc
value: [0.6038961 0.6097561 0.54901961 0.59565217 0.62150538 0.5956044
0.6359743 0.64086022 0.65367965 0.61353712]
mean value: 0.6119485045867612
MCC on Blind test: -0.05
Accuracy on Blind test: 0.66
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16579056 0.0958972 0.11130762 0.09484553 0.08717442 0.09650135
0.09553623 0.09150696 0.08706903 0.10944676]
mean value: 0.10350756645202637
key: score_time
value: [0.01211166 0.01198912 0.01183915 0.01130962 0.01152658 0.01141095
0.01134896 0.0119698 0.01131058 0.01155162]
mean value: 0.011636805534362794
key: test_mcc
value: [0.97467943 1. 0.94996791 0.9258201 0.87773765 1.
0.83239263 0.97434188 0.90083601 0.97434188]
mean value: 0.9410117489983211
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 1. 0.97435897 0.96153846 0.93506494 1.
0.90909091 0.98701299 0.94805195 0.98701299]
mean value: 0.968931068931069
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 1. 0.975 0.96296296 0.9382716 1.
0.91566265 0.98734177 0.95121951 0.98734177]
mean value: 0.9705142047154462
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 1. 0.95121951 0.92857143 0.88372093 1.
0.84444444 0.975 0.90697674 0.975 ]
mean value: 0.9439933059629599
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 1. 0.97435897 0.96153846 0.93589744 1.
0.91025641 0.98684211 0.94736842 0.98684211]
mean value: 0.9690283400809717
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 1. 0.95121951 0.92857143 0.88372093 1.
0.84444444 0.975 0.90697674 0.975 ]
mean value: 0.9439933059629599
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.92
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04190159 0.08826089 0.09064293 0.06921363 0.08002448 0.08611894
0.07872844 0.07415462 0.07829404 0.05201054]
mean value: 0.07393500804901124
key: score_time
value: [0.01220369 0.01917839 0.03349042 0.01935863 0.02466106 0.01947427
0.01944065 0.01956105 0.01255679 0.01624155]
mean value: 0.01961665153503418
key: test_mcc
value: [0.85063978 0.82078268 0.85634884 0.83405766 0.76725173 0.848923
0.78862619 0.83165353 0.80937951 0.83165353]
mean value: 0.8239316445423343
key: train_mcc
value: [0.86147319 0.87827531 0.89616338 0.89616338 0.90675201 0.86951406
0.8885052 0.88081316 0.91998601 0.8911284 ]
mean value: 0.8888774113680993
key: test_accuracy
value: [0.92307692 0.91025641 0.92307692 0.91025641 0.87012987 0.92207792
0.88311688 0.90909091 0.8961039 0.90909091]
mean value: 0.9056277056277056
key: train_accuracy
value: [0.92816092 0.93678161 0.9454023 0.9454023 0.95121951 0.93256815
0.94117647 0.93687231 0.95839311 0.94261119]
mean value: 0.9418587872491301
key: test_fscore
value: [0.92682927 0.91139241 0.92857143 0.91764706 0.88372093 0.925
0.89411765 0.91764706 0.90697674 0.91764706]
mean value: 0.912954959987542
key: train_fscore
value: [0.93188011 0.93989071 0.94822888 0.94822888 0.95355191 0.93587995
0.94451962 0.94054054 0.96 0.94565217]
mean value: 0.9448372778603152
key: test_precision
value: [0.88372093 0.9 0.86666667 0.84782609 0.79166667 0.88095238
0.80851064 0.84782609 0.82978723 0.84782609]
mean value: 0.8504782777728264
key: train_precision
value: [0.88601036 0.89583333 0.9015544 0.9015544 0.91122715 0.89322917
0.89487179 0.8877551 0.92307692 0.89690722]
mean value: 0.8992019861515833
key: test_recall
value: [0.97435897 0.92307692 1. 1. 1. 0.97368421
1. 1. 1. 1. ]
mean value: 0.9871120107962214
key: train_recall
value: [0.98275862 0.98850575 1. 1. 1. 0.98280802
1. 1. 1. 1. ]
mean value: 0.9954072390738729
key: test_roc_auc
value: [0.92307692 0.91025641 0.92307692 0.91025641 0.87179487 0.92273954
0.88461538 0.90789474 0.89473684 0.90789474]
mean value: 0.905634278002699
key: train_roc_auc
value: [0.92816092 0.93678161 0.9454023 0.9454023 0.95114943 0.93249597
0.94109195 0.93696275 0.95845272 0.94269341]
mean value: 0.9418593353752923
key: test_jcc
value: [0.86363636 0.8372093 0.86666667 0.84782609 0.79166667 0.86046512
0.80851064 0.84782609 0.82978723 0.84782609]
mean value: 0.8401420248784339
key: train_jcc
value: [0.87244898 0.88659794 0.9015544 0.9015544 0.91122715 0.87948718
0.89487179 0.8877551 0.92307692 0.89690722]
mean value: 0.8955481096044878
MCC on Blind test: 0.2
Accuracy on Blind test: 0.8
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01488948 0.01321101 0.01126218 0.01059198 0.01078653 0.01084661
0.01091337 0.01102781 0.01104927 0.01110911]
mean value: 0.011568737030029298
key: score_time
value: [0.01220536 0.00957465 0.00932431 0.00905275 0.00900197 0.00919437
0.00903678 0.008986 0.00921941 0.00947666]
mean value: 0.009507226943969726
key: test_mcc
value: [0.15430335 0.18043874 0.36004115 0.2057378 0.24645515 0.01499738
0.45639039 0.32454428 0.17017723 0.11748821]
mean value: 0.22305736807135446
key: train_mcc
value: [0.21558935 0.29925115 0.26162499 0.24228126 0.24848105 0.21956888
0.26890881 0.25709751 0.23161412 0.22826761]
mean value: 0.2472684720139665
key: test_accuracy
value: [0.57692308 0.58974359 0.67948718 0.6025641 0.62337662 0.50649351
0.72727273 0.66233766 0.58441558 0.55844156]
mean value: 0.6111055611055611
key: train_accuracy
value: [0.60775862 0.64942529 0.63074713 0.62068966 0.6241033 0.6097561
0.63414634 0.62840746 0.61549498 0.61406026]
mean value: 0.6234589125810123
key: test_fscore
value: [0.59259259 0.56756757 0.69135802 0.58666667 0.61333333 0.53658537
0.73417722 0.675 0.62790698 0.55263158]
mean value: 0.6177819321586605
key: train_fscore
value: [0.61276596 0.65826331 0.63649222 0.63636364 0.63407821 0.61581921
0.6473029 0.63572433 0.62777778 0.61951909]
mean value: 0.6324106650148837
key: test_precision
value: [0.57142857 0.6 0.66666667 0.61111111 0.62162162 0.5
0.70731707 0.65853659 0.57446809 0.56756757]
mean value: 0.6078717282038507
key: train_precision
value: [0.60504202 0.6420765 0.62674095 0.61111111 0.61852861 0.60724234
0.62566845 0.62258953 0.60752688 0.61002786]
mean value: 0.6176554245664311
key: test_recall
value: [0.61538462 0.53846154 0.71794872 0.56410256 0.60526316 0.57894737
0.76315789 0.69230769 0.69230769 0.53846154]
mean value: 0.630634278002699
key: train_recall
value: [0.62068966 0.67528736 0.64655172 0.6637931 0.6504298 0.62464183
0.67048711 0.64942529 0.64942529 0.62931034]
mean value: 0.6480041497875704
key: test_roc_auc
value: [0.57692308 0.58974359 0.67948718 0.6025641 0.6231444 0.5074224
0.72773279 0.66194332 0.58299595 0.55870445]
mean value: 0.6110661268556005
key: train_roc_auc
value: [0.60775862 0.64942529 0.63074713 0.62068966 0.62406547 0.60973471
0.63409413 0.62843757 0.61554359 0.61408211]
mean value: 0.6234578269604453
key: test_jcc
value: [0.42105263 0.39622642 0.52830189 0.41509434 0.44230769 0.36666667
0.58 0.50943396 0.45762712 0.38181818]
mean value: 0.4498528894789141
key: train_jcc
value: [0.44171779 0.49060543 0.46680498 0.46666667 0.46421268 0.44489796
0.47852761 0.46597938 0.45748988 0.44877049]
mean value: 0.46256728625770993
MCC on Blind test: 0.12
Accuracy on Blind test: 0.72
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03801465 0.03049207 0.03110695 0.03235173 0.02799249 0.02999878
0.03647971 0.02684927 0.0257833 0.02550769]
mean value: 0.030457663536071777
key: score_time
value: [0.01255345 0.01240373 0.01241517 0.01233935 0.01228189 0.01221061
0.01228714 0.01230478 0.01222181 0.01215315]
mean value: 0.01231710910797119
key: test_mcc
value: [0.50800051 0.79908586 0.74833148 0.72980045 0.78862619 0.85485041
0.87773765 0.84537494 0.71613058 0.75611265]
mean value: 0.7624050703842445
key: train_mcc
value: [0.48457159 0.88518905 0.84765472 0.84046848 0.86466206 0.84034512
0.86871425 0.80917893 0.84290911 0.83327339]
mean value: 0.8116966697844912
key: test_accuracy
value: [0.70512821 0.8974359 0.85897436 0.85897436 0.88311688 0.92207792
0.93506494 0.92207792 0.85714286 0.87012987]
mean value: 0.871012321012321
key: train_accuracy
value: [0.69827586 0.94252874 0.91810345 0.91810345 0.93113343 0.91391679
0.93400287 0.90243902 0.91822095 0.91248207]
mean value: 0.8989206616204093
key: test_fscore
value: [0.58181818 0.89189189 0.87640449 0.84507042 0.89411765 0.92682927
0.9382716 0.92105263 0.86419753 0.88372093]
mean value: 0.8623374603592788
key: train_fscore
value: [0.5766129 0.94301994 0.92430279 0.91376702 0.93370166 0.92084433
0.93274854 0.8969697 0.92286874 0.91812081]
mean value: 0.8882956421286032
key: test_precision
value: [1. 0.94285714 0.78 0.9375 0.80851064 0.86363636
0.88372093 0.94594595 0.83333333 0.80851064]
mean value: 0.8804014992601088
key: train_precision
value: [0.96621622 0.93502825 0.85925926 0.96485623 0.90133333 0.85330073
0.95223881 0.94871795 0.87212276 0.86146096]
mean value: 0.9114534494939938
key: test_recall
value: [0.41025641 0.84615385 1. 0.76923077 1. 1.
1. 0.8974359 0.8974359 0.97435897]
mean value: 0.8794871794871795
key: train_recall
value: [0.41091954 0.95114943 1. 0.86781609 0.96848138 1.
0.91404011 0.85057471 0.97988506 0.98275862]
mean value: 0.8925624938247209
key: test_roc_auc
value: [0.70512821 0.8974359 0.85897436 0.85897436 0.88461538 0.92307692
0.93589744 0.92240216 0.85661269 0.86875843]
mean value: 0.871187584345479
key: train_roc_auc
value: [0.69827586 0.94252874 0.91810345 0.91810345 0.93107977 0.9137931
0.93403155 0.90236472 0.91830929 0.91258275]
mean value: 0.898917267727168
key: test_jcc
value: [0.41025641 0.80487805 0.78 0.73170732 0.80851064 0.86363636
0.88372093 0.85365854 0.76086957 0.79166667]
mean value: 0.7688904476746287
key: train_jcc
value: [0.40509915 0.89218329 0.85925926 0.84122563 0.87564767 0.85330073
0.8739726 0.81318681 0.85678392 0.84863524]
mean value: 0.8119294297698206
MCC on Blind test: 0.25
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03357005 0.02876759 0.02612519 0.0270865 0.02361774 0.02532244
0.02340913 0.02862954 0.03337932 0.02900386]
mean value: 0.027891135215759276
key: score_time
value: [0.01290989 0.01256585 0.01234102 0.01234937 0.01223135 0.01227951
0.01230288 0.02895355 0.03337622 0.02062821]
mean value: 0.016993784904479982
key: test_mcc
value: [0.79487179 0.84726867 0.8273403 0.65271395 0.79217274 0.68022227
0.76725173 0.76876426 0.76581079 0.76829903]
mean value: 0.7664715537987572
key: train_mcc
value: [0.8568314 0.89081931 0.91712495 0.70204523 0.79662047 0.77235744
0.85325636 0.77145431 0.70533336 0.86573141]
mean value: 0.8131574235838349
key: test_accuracy
value: [0.8974359 0.92307692 0.91025641 0.80769231 0.8961039 0.83116883
0.87012987 0.88311688 0.87012987 0.88311688]
mean value: 0.8772227772227772
key: train_accuracy
value: [0.92816092 0.9454023 0.95833333 0.83908046 0.8938307 0.87804878
0.92395983 0.88091822 0.83213773 0.93256815]
mean value: 0.9012440426128399
key: test_fscore
value: [0.8974359 0.92105263 0.91566265 0.76923077 0.89473684 0.80597015
0.88372093 0.88 0.88636364 0.88888889]
mean value: 0.8743062395692102
key: train_fscore
value: [0.92937853 0.94555874 0.95898161 0.81518152 0.88544892 0.86443381
0.92808684 0.87051482 0.85608856 0.93118594]
mean value: 0.8984859293514272
key: test_precision
value: [0.8974359 0.94594595 0.86363636 0.96153846 0.89473684 0.93103448
0.79166667 0.91666667 0.79591837 0.85714286]
mean value: 0.8855722551243681
key: train_precision
value: [0.91388889 0.94285714 0.94428969 0.95736434 0.96296296 0.97482014
0.8814433 0.95221843 0.7483871 0.94925373]
mean value: 0.9227485730393152
key: test_recall
value: [0.8974359 0.8974359 0.97435897 0.64102564 0.89473684 0.71052632
1. 0.84615385 1. 0.92307692]
mean value: 0.8784750337381917
key: train_recall
value: [0.9454023 0.94827586 0.97413793 0.70977011 0.81948424 0.7765043
0.97994269 0.80172414 1. 0.9137931 ]
mean value: 0.8869034680367552
key: test_roc_auc
value: [0.8974359 0.92307692 0.91025641 0.80769231 0.89608637 0.82962213
0.87179487 0.88360324 0.86842105 0.88259109]
mean value: 0.8770580296896087
key: train_roc_auc
value: [0.92816092 0.9454023 0.95833333 0.83908046 0.89393752 0.87819468
0.92387939 0.88080476 0.83237822 0.93254125]
mean value: 0.9012712841287093
key: test_jcc
value: [0.81395349 0.85365854 0.84444444 0.625 0.80952381 0.675
0.79166667 0.78571429 0.79591837 0.8 ]
mean value: 0.7794879598653603
key: train_jcc
value: [0.86807388 0.89673913 0.92119565 0.68802228 0.79444444 0.76123596
0.86582278 0.77071823 0.7483871 0.87123288]
mean value: 0.8185872335200699
MCC on Blind test: -0.01
Accuracy on Blind test: 0.72
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.24143291 0.21029639 0.20286584 0.20571828 0.20957994 0.21945953
0.21285486 0.21113038 0.21808934 0.21113563]
mean value: 0.21425631046295165
key: score_time
value: [0.01734018 0.01620436 0.0164032 0.01739216 0.01739073 0.02124238
0.01720047 0.01746988 0.01800656 0.01609206]
mean value: 0.01747419834136963
key: test_mcc
value: [0.9258201 0.97467943 0.94996791 0.9258201 0.94935876 0.97435897
0.85485041 0.94929201 0.92480439 0.94929201]
mean value: 0.9378244104085827
key: train_mcc
value: [0.99713055 0.98857105 0.98573393 0.99141615 0.98575394 0.98292859
0.99427743 0.98575452 0.99713467 0.9914285 ]
mean value: 0.990012934492672
key: test_accuracy
value: [0.96153846 0.98717949 0.97435897 0.96153846 0.97402597 0.98701299
0.92207792 0.97402597 0.96103896 0.97402597]
mean value: 0.9676823176823177
key: train_accuracy
value: [0.99856322 0.99425287 0.99281609 0.99568966 0.9928264 0.99139168
0.99713056 0.9928264 0.99856528 0.99569584]
mean value: 0.9949757994030244
key: test_fscore
value: [0.96296296 0.98734177 0.975 0.96296296 0.97435897 0.98701299
0.92682927 0.975 0.96296296 0.975 ]
mean value: 0.9689431890705432
key: train_fscore
value: [0.99856528 0.99428571 0.99286733 0.99570815 0.99288762 0.99147727
0.99714286 0.99286733 0.99856528 0.99570815]
mean value: 0.9950075001940804
key: test_precision
value: [0.92857143 0.975 0.95121951 0.92857143 0.95 0.97435897
0.86363636 0.95121951 0.92857143 0.95121951]
mean value: 0.9402368160294989
key: train_precision
value: [0.99713467 0.98863636 0.98583569 0.99145299 0.98587571 0.98309859
0.99430199 0.98583569 0.99713467 0.99145299]
mean value: 0.9900759367684521
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.98717949 0.97435897 0.96153846 0.97435897 0.98717949
0.92307692 0.97368421 0.96052632 0.97368421]
mean value: 0.9677125506072874
key: train_roc_auc
value: [0.99856322 0.99425287 0.99281609 0.99568966 0.99281609 0.99137931
0.99712644 0.99283668 0.99856734 0.99570201]
mean value: 0.9949749695352896
key: test_jcc
value: [0.92857143 0.975 0.95121951 0.92857143 0.95 0.97435897
0.86363636 0.95121951 0.92857143 0.95121951]
mean value: 0.9402368160294989
key: train_jcc
value: [0.99713467 0.98863636 0.98583569 0.99145299 0.98587571 0.98309859
0.99430199 0.98583569 0.99713467 0.99145299]
mean value: 0.9900759367684521
MCC on Blind test: 0.56
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.09038091 0.10030103 0.10699773 0.10314679 0.09516573 0.09135485
0.07111168 0.09940124 0.09996367 0.07392859]
mean value: 0.09317522048950196
key: score_time
value: [0.02252746 0.03248572 0.03904748 0.03635025 0.03706026 0.02310634
0.01902294 0.03183651 0.02209926 0.02426434]
mean value: 0.028780055046081544
key: test_mcc
value: [0.97467943 0.97467943 0.97467943 0.97467943 0.92495119 0.97435897
0.83239263 0.90083601 0.97434188 0.94929201]
mean value: 0.9454890438510555
key: train_mcc
value: [1. 1. 0.99713055 1. 0.99427743 0.99427743
0.99713465 1. 1. 1. ]
mean value: 0.9982820067812821
key: test_accuracy
value: [0.98717949 0.98717949 0.98717949 0.98717949 0.96103896 0.98701299
0.90909091 0.94805195 0.98701299 0.97402597]
mean value: 0.9714951714951715
key: train_accuracy
value: [1. 1. 0.99856322 1. 0.99713056 0.99713056
0.99856528 1. 1. 1. ]
mean value: 0.9991389617243028
key: test_fscore
value: [0.98734177 0.98734177 0.98734177 0.98734177 0.96202532 0.98701299
0.91566265 0.95121951 0.98734177 0.975 ]
mean value: 0.9727629327025709
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[1. 1. 0.99856528 1. 0.99714286 0.99714286
0.99856938 1. 1. 1. ]
mean value: 0.9991420378891638
key: test_precision
value: [0.975 0.975 0.975 0.975 0.92682927 0.97435897
0.84444444 0.90697674 0.975 0.95121951]
mean value: 0.947882894347727
key: train_precision
value: [1. 1. 0.99713467 1. 0.99430199 0.99430199
0.99714286 1. 1. 1. ]
mean value: 0.9982881516233951
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 0.98717949 0.98717949 0.98717949 0.96153846 0.98717949
0.91025641 0.94736842 0.98684211 0.97368421]
mean value: 0.9715587044534413
key: train_roc_auc
value: [1. 1. 0.99856322 1. 0.99712644 0.99712644
0.99856322 1. 1. 1. ]
mean value: 0.9991379310344828
key: test_jcc
value: [0.975 0.975 0.975 0.975 0.92682927 0.97435897
0.84444444 0.90697674 0.975 0.95121951]
mean value: 0.947882894347727
key: train_jcc
value: [1. 1. 0.99713467 1. 0.99430199 0.99430199
0.99714286 1. 1. 1. ]
mean value: 0.9982881516233951
MCC on Blind test: 0.46
Accuracy on Blind test: 0.93
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.32979703 0.27593613 0.30429029 0.25532198 0.34483504 0.2347784
0.28835416 0.30712223 0.42109704 0.37772584]
mean value: 0.3139258146286011
key: score_time
value: [0.0304029 0.0182817 0.0335145 0.01835108 0.01883078 0.03072095
0.01809335 0.03113055 0.04706335 0.03125429]
mean value: 0.027764344215393068
key: test_mcc
value: [0.8121419 0.90219371 0.90219371 0.90219371 0.87773765 0.90109146
0.90109146 0.83165353 0.87734648 0.94929201]
mean value: 0.885693560945207
key: train_mcc
value: [0.98008335 0.98008335 0.97446387 0.98008335 0.97730122 0.97730122
0.98011104 0.98011218 0.9773027 0.97450093]
mean value: 0.9781343227317184
key: test_accuracy
value: [0.8974359 0.94871795 0.94871795 0.94871795 0.93506494 0.94805195
0.94805195 0.90909091 0.93506494 0.97402597]
mean value: 0.9392940392940393
key: train_accuracy
value: [0.98994253 0.98994253 0.98706897 0.98994253 0.98852224 0.98852224
0.98995696 0.98995696 0.98852224 0.98708752]
mean value: 0.9889464700935042
key: test_fscore
value: [0.90697674 0.95121951 0.95121951 0.95121951 0.9382716 0.95
0.95 0.91764706 0.93975904 0.975 ]
mean value: 0.9431312980677792
key: train_fscore
value: [0.99004267 0.99004267 0.98723404 0.99004267 0.98866856 0.98866856
0.99007092 0.99004267 0.98863636 0.98723404]
mean value: 0.9890683178222951
key: test_precision
value: [0.82978723 0.90697674 0.90697674 0.90697674 0.88372093 0.9047619
0.9047619 0.84782609 0.88636364 0.95121951]
mean value: 0.892937144187234
key: train_precision
value: [0.98028169 0.98028169 0.97478992 0.98028169 0.97759104 0.97759104
0.98033708 0.98028169 0.97752809 0.97478992]
mean value: 0.9783753833864611
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8974359 0.94871795 0.94871795 0.94871795 0.93589744 0.94871795
0.94871795 0.90789474 0.93421053 0.97368421]
mean value: 0.9392712550607287
key: train_roc_auc
value: [0.98994253 0.98994253 0.98706897 0.98994253 0.98850575 0.98850575
0.98994253 0.98997135 0.98853868 0.98710602]
mean value: 0.9889466620557916
key: test_jcc
value: [0.82978723 0.90697674 0.90697674 0.90697674 0.88372093 0.9047619
0.9047619 0.84782609 0.88636364 0.95121951]
mean value: 0.892937144187234
key: train_jcc
value: [0.98028169 0.98028169 0.97478992 0.98028169 0.97759104 0.97759104
0.98033708 0.98028169 0.97752809 0.97478992]
mean value: 0.9783753833864611
MCC on Blind test: 0.07
Accuracy on Blind test: 0.88
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.78035641 0.78779244 0.79045606 0.76603675 0.78087711 0.77754664
0.76633167 0.76756024 0.7693882 0.77153254]
mean value: 0.7757878065109253
key: score_time
value: [0.00981498 0.01024652 0.00946355 0.00959802 0.00979328 0.00943899
0.0094614 0.00945497 0.00986099 0.0094862 ]
mean value: 0.00966188907623291
key: test_mcc
value: [0.9258201 0.97467943 0.94996791 0.9258201 0.94935876 0.94935876
0.81032908 0.92480439 0.90083601 0.94929201]
mean value: 0.9260266547138517
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99713467]
mean value: 0.9997134670487106
key: test_accuracy
value: [0.96153846 0.98717949 0.97435897 0.96153846 0.97402597 0.97402597
0.8961039 0.96103896 0.94805195 0.97402597]
mean value: 0.9611888111888112
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99856528]
mean value: 0.9998565279770445
key: test_fscore
value: [0.96296296 0.98734177 0.975 0.96296296 0.97435897 0.97435897
0.9047619 0.96296296 0.95121951 0.975 ]
mean value: 0.9630930026715763
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99856528]
mean value: 0.9998565279770445
key: test_precision
value: [0.92857143 0.975 0.95121951 0.92857143 0.95 0.95
0.82608696 0.92857143 0.90697674 0.95121951]
mean value: 0.9296217010812315
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99713467]
mean value: 0.9997134670487106
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.98717949 0.97435897 0.96153846 0.97435897 0.97435897
0.8974359 0.96052632 0.94736842 0.97368421]
mean value: 0.9612348178137652
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99856734]
mean value: 0.9998567335243553
key: test_jcc
value: [0.92857143 0.975 0.95121951 0.92857143 0.95 0.95
0.82608696 0.92857143 0.90697674 0.95121951]
mean value: 0.9296217010812315
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99713467]
mean value: 0.9997134670487106
MCC on Blind test: 0.68
Accuracy on Blind test: 0.95
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.05917978 0.03482533 0.03826451 0.03504586 0.0383997 0.04490495
0.03810263 0.0381248 0.03803468 0.0383141 ]
mean value: 0.040319633483886716
key: score_time
value: [0.02076483 0.01292348 0.01296926 0.01533365 0.01286793 0.01446438
0.0134151 0.01298833 0.01308966 0.01466441]
mean value: 0.014348101615905762
key: test_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02525949 0.0332334 0.03655052 0.02820945 0.01706648 0.01690316
0.01729512 0.01711345 0.02131271 0.0167582 ]
mean value: 0.022970199584960938
key: score_time
value: [0.02917433 0.02953553 0.02930069 0.01322436 0.01230454 0.0122962
0.01228094 0.0122776 0.01258254 0.01641154]
mean value: 0.017938828468322753
key: test_mcc
value: [0.8273403 0.82078268 0.8273403 0.8273403 0.81032908 0.84537494
0.74617462 0.87734648 0.71613058 0.75611265]
mean value: 0.8054271936647068
key: train_mcc
value: [0.86147319 0.87042584 0.85885872 0.86529022 0.86772975 0.8659075
0.87432528 0.8686193 0.84473646 0.86816244]
mean value: 0.8645528691871438
key: test_accuracy
value: [0.91025641 0.91025641 0.91025641 0.91025641 0.8961039 0.92207792
0.85714286 0.93506494 0.85714286 0.87012987]
mean value: 0.8978687978687978
key: train_accuracy
value: [0.92816092 0.93247126 0.92672414 0.93103448 0.93256815 0.93113343
0.93543759 0.93113343 0.92109039 0.93256815]
mean value: 0.9302321938026683
key: test_fscore
value: [0.91566265 0.91139241 0.91566265 0.91566265 0.9047619 0.92307692
0.87356322 0.93975904 0.86419753 0.88372093]
mean value: 0.9047459900341487
key: train_fscore
value: [0.93188011 0.93605442 0.93061224 0.9338843 0.93517241 0.93424658
0.9382716 0.93495935 0.92392808 0.93517241]
mean value: 0.9334181508094642
key: test_precision
value: [0.86363636 0.9 0.86363636 0.86363636 0.82608696 0.9
0.7755102 0.88636364 0.83333333 0.80851064]
mean value: 0.8520713859507305
key: train_precision
value: [0.88601036 0.88888889 0.88372093 0.8968254 0.90159574 0.89501312
0.9 0.88461538 0.89066667 0.89920424]
mean value: 0.8926540741995457
key: test_recall
value: [0.97435897 0.92307692 0.97435897 0.97435897 1. 0.94736842
1. 1. 0.8974359 0.97435897]
mean value: 0.9665317139001349
key: train_recall
value: [0.98275862 0.98850575 0.98275862 0.97413793 0.9713467 0.97707736
0.97994269 0.99137931 0.95977011 0.97413793]
mean value: 0.978181503803972
key: test_roc_auc
value: [0.91025641 0.91025641 0.91025641 0.91025641 0.8974359 0.92240216
0.85897436 0.93421053 0.85661269 0.86875843]
mean value: 0.8979419703103915
key: train_roc_auc
value: [0.92816092 0.93247126 0.92672414 0.93103448 0.93251243 0.93106742
0.93537365 0.93121974 0.9211458 0.9326277 ]
mean value: 0.9302337548990548
key: test_jcc
value: [0.84444444 0.8372093 0.84444444 0.84444444 0.82608696 0.85714286
0.7755102 0.88636364 0.76086957 0.79166667]
mean value: 0.8268182521652838
key: train_jcc /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
value: [0.87244898 0.8797954 0.87022901 0.87596899 0.87823834 0.87660668
0.88372093 0.8778626 0.85861183 0.87823834]
mean value: 0.8751721094480582
MCC on Blind test: 0.18
Accuracy on Blind test: 0.84
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.35235929 0.27599955 0.31120014 0.3325994 0.31556654 0.34663987
0.41359758 0.3150804 0.36207342 0.39929199]
mean value: 0.3424408197402954
key: score_time
value: [0.02195287 0.01247311 0.01908541 0.01925278 0.0225265 0.0213666
0.01939416 0.01930833 0.01937413 0.01915121]
mean value: 0.019388508796691895
key: test_mcc
value: [0.8273403 0.82078268 0.8273403 0.83405766 0.81032908 0.848923
0.76725173 0.8542977 0.80937951 0.8542977 ]
mean value: 0.8253999666661787
key: train_mcc
value: [0.86147319 0.87042584 0.85885872 0.8662131 0.86772975 0.86688517
0.89368912 0.88081316 0.87967879 0.88854099]
mean value: 0.8734307814726214
key: test_accuracy
value: [0.91025641 0.91025641 0.91025641 0.91025641 0.8961039 0.92207792
0.87012987 0.92207792 0.8961039 0.92207792]
mean value: 0.906959706959707
key: train_accuracy
value: [0.92816092 0.93247126 0.92672414 0.93103448 0.93256815 0.93113343
0.94404591 0.93687231 0.93830703 0.94117647]
mean value: 0.9342494104454229
key: test_fscore
value: [0.91566265 0.91139241 0.91566265 0.91764706 0.9047619 0.925
0.88372093 0.92857143 0.90697674 0.92857143]
mean value: 0.9137967201415006
key: train_fscore
value: [0.93188011 0.93605442 0.93061224 0.93424658 0.93517241 0.9346049
0.94708277 0.94054054 0.94068966 0.94436906]
mean value: 0.9375252696889508
key: test_precision
value: [0.86363636 0.9 0.86363636 0.84782609 0.82608696 0.88095238
0.79166667 0.86666667 0.82978723 0.86666667]
mean value: 0.8536925385745923
key: train_precision
value: [0.88601036 0.88888889 0.88372093 0.89267016 0.90159574 0.89090909
0.89948454 0.8877551 0.90450928 0.89460154]
mean value: 0.8930145638833124
key: test_recall
value: [0.97435897 0.92307692 0.97435897 1. 1. 0.97368421
1. 1. 1. 1. ]
mean value: 0.9845479082321188
key: train_recall
value: [0.98275862 0.98850575 0.98275862 0.97988506 0.9713467 0.98280802
1. 1. 0.97988506 1. ]
mean value: 0.9867947831241972
key: test_roc_auc
value: [0.91025641 0.91025641 0.91025641 0.91025641 0.8974359 0.92273954
0.87179487 0.92105263 0.89473684 0.92105263]
mean value: 0.9069838056680162
key: train_roc_auc
value: [0.92816092 0.93247126 0.92672414 0.93103448 0.93251243 0.93105918
0.94396552 0.93696275 0.9383666 0.94126074]
mean value: 0.9342518031815038
key: test_jcc
value: [0.84444444 0.8372093 0.84444444 0.84782609 0.82608696 0.86046512
0.79166667 0.86666667 0.82978723 0.86666667]
mean value: 0.8415263585014354
key: train_jcc
value: [0.87244898 0.8797954 0.87022901 0.87660668 0.87823834 0.87723785
0.89948454 0.8877551 0.88802083 0.89460154]
mean value: 0.8824418274953881
MCC on Blind test: 0.18
Accuracy on Blind test: 0.84
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02446961 0.02406549 0.02503228 0.02821255 0.02981186 0.03915668
0.037498 0.02961111 0.05306292 0.02220559]
mean value: 0.03131260871887207
key: score_time
value: [0.01167083 0.01182604 0.0117569 0.0122273 0.01220822 0.01210237
0.01215315 0.01405954 0.0124464 0.00771523]
mean value: 0.011816596984863282
key: test_mcc
value: [ 0.77459667 -0.25819889 -0.37796447 0. -0.54772256 0.73029674
0.41666667 0.16666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.79115481 0.88040627 0.79115481 0.94112395 0.76271217 0.88224834
0.82085561 0.8221656 0.8221656 0.88057041]
mean value: 0.839455756928473
key: test_accuracy
value: [0.875 0.375 0.375 0.5 0.28571429 0.85714286
0.71428571 0.57142857 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.89393939 0.93939394 0.89393939 0.96969697 0.88059701 0.94029851
0.91044776 0.91044776 0.91044776 0.94029851]
mean value: 0.9189507010402533
key: test_fscore
value: [0.85714286 0.28571429 0.54545455 0.5 0. 0.8
0.66666667 0.57142857 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.88888889 0.9375 0.88888889 0.96875 0.87878788 0.93939394
0.91176471 0.90625 0.90625 0.93939394]
mean value: 0.9165868241235888
key: test_precision
value: [1. 0.33333333 0.42857143 0.5 0. 1.
0.66666667 0.66666667 0.8 nan]
mean value: nan
key: train_precision
value: [0.93333333 0.96774194 0.93333333 1. 0.90625 0.96875
0.91176471 0.93548387 0.93548387 0.93939394]
mean value: 0.9431534989362313
key: test_recall
value: [0.75 0.25 0.75 0.5 0. 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [0.84848485 0.90909091 0.84848485 0.93939394 0.85294118 0.91176471
0.91176471 0.87878788 0.87878788 0.93939394]
mean value: 0.8918894830659536
key: test_roc_auc
value: [0.875 0.375 0.375 0.5 0.25 0.83333333
0.70833333 0.58333333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.89393939 0.93939394 0.89393939 0.96969697 0.88101604 0.94073084
0.91042781 0.90998217 0.90998217 0.9402852 ]
mean value: 0.918939393939394
key: test_jcc
value: [0.75 0.16666667 0.375 0.33333333 0. 0.66666667
0.5 0.4 0.8 nan]
mean value: nan
key: train_jcc
value: [0.8 0.88235294 0.8 0.93939394 0.78378378 0.88571429
0.83783784 0.82857143 0.82857143 0.88571429]
mean value: 0.847193993076346
MCC on Blind test: 0.16
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.55338359 0.64924359 0.61505604 0.68635273 0.58190346 0.53748512
0.64356756 0.59687781 0.55052114 0.60271025]
mean value: 0.6017101287841797
key: score_time
value: [0.01203108 0.01240993 0.01209831 0.01516747 0.01503181 0.01205349
0.01648402 0.01222897 0.01527858 0.00648594]
mean value: 0.012926959991455078
key: test_mcc
value: [ 0.77459667 -0.25819889 -0.37796447 0. 0.09128709 0.73029674
0.73029674 0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [0.9701425 1. 0.79115481 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.973188554072469
key: test_accuracy
value: [0.875 0.375 0.375 0.5 0.57142857 0.85714286
0.85714286 0.57142857 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [0.98484848 1. 0.89393939 1. 1. 0.98507463
1. 1. 1. 1. ]
mean value: 0.986386250565355
key: test_fscore
value: [0.85714286 0.28571429 0.54545455 0.5 0.4 0.8
0.8 0.57142857 0.8 nan]
mean value: nan
key: train_fscore
value: [0.98507463 1. 0.88888889 1. 1. 0.98507463
1. 1. 1. 1. ]
mean value: 0.9859038142620232
key: test_precision
value: [1. 0.33333333 0.42857143 0.5 0.5 1.
1. 0.66666667 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.97058824 1. 0.93333333 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9903921568627451
key: test_recall
value: [0.75 0.25 0.75 0.5 0.33333333 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 0.84848485 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.9819073083778966
key: test_roc_auc
value: [0.875 0.375 0.375 0.5 0.54166667 0.83333333
0.83333333 0.58333333 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [0.98484848 1. 0.89393939 1. 1. 0.98529412
1. 1. 1. 1. ]
mean value: 0.9864081996434938
key: test_jcc
value: [0.75 0.16666667 0.375 0.33333333 0.25 0.66666667
0.66666667 0.4 0.66666667 nan]
mean value: nan
key: train_jcc
value: [0.97058824 1. 0.8 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.9741176470588235
MCC on Blind test: 0.08
Accuracy on Blind test: 0.66
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01258373 0.01153731 0.00872874 0.00858068 0.00833488 0.00848627
0.00840449 0.00844121 0.00832319 0.00833082]
mean value: 0.00917513370513916
key: score_time
value: [0.01168561 0.01006675 0.00891876 0.00857306 0.0087409 0.00858164
0.00865054 0.00863862 0.00853992 0.0042181 ]
mean value: 0.008661389350891113
key: test_mcc
value: [ 0.25819889 0.5 0. 0. 0. -0.09128709
0.75 0.41666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.58551687 0.60416485 0.61313934 0.54570516 0.592647 0.50508734
0.53948203 0.62451902 0.50926553 0.55496178]
mean value: 0.5674488926630006
key: test_accuracy
value: [0.625 0.75 0.5 0.5 0.57142857 0.42857143
0.85714286 0.71428571 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.78787879 0.78787879 0.8030303 0.77272727 0.79104478 0.74626866
0.76119403 0.80597015 0.74626866 0.7761194 ]
mean value: 0.7778380823156943
key: test_fscore
value: [0.57142857 0.75 0.33333333 0.5 0. 0.5
0.85714286 0.75 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.76666667 0.75 0.78688525 0.76923077 0.77419355 0.72131148
0.73333333 0.77966102 0.70175439 0.7826087 ]
mean value: 0.756564513749558
key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[0.66666667 0.75 0.5 0.5 0. 0.4
0.75 0.75 0.8 nan]
mean value: nan
key: train_precision
value: [0.85185185 0.91304348 0.85714286 0.78125 0.85714286 0.81481481
0.84615385 0.88461538 0.83333333 0.75 ]
mean value: 0.8389348423315814
key: test_recall
value: [0.5 0.75 0.25 0.5 0. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [0.6969697 0.63636364 0.72727273 0.75757576 0.70588235 0.64705882
0.64705882 0.6969697 0.60606061 0.81818182]
mean value: 0.693939393939394
key: test_roc_auc
value: [0.625 0.75 0.5 0.5 0.5 0.45833333
0.875 0.70833333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.78787879 0.78787879 0.8030303 0.77272727 0.79233512 0.74777184
0.76292335 0.8043672 0.74420677 0.77673797]
mean value: 0.7779857397504456
key: test_jcc
value: [0.4 0.6 0.2 0.33333333 0. 0.33333333
0.75 0.6 0.8 nan]
mean value: nan
key: train_jcc
value: [0.62162162 0.6 0.64864865 0.625 0.63157895 0.56410256
0.57894737 0.63888889 0.54054054 0.64285714]
mean value: 0.6092185722448881
MCC on Blind test: 0.13
Accuracy on Blind test: 0.73
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00859571 0.00846696 0.00853562 0.00877476 0.00880957 0.00872087
0.00887156 0.00945306 0.00851297 0.00871563]
mean value: 0.008745670318603516
key: score_time
value: [0.00872397 0.00864506 0.00859284 0.00901389 0.00882626 0.00878763
0.00877357 0.00864148 0.00871348 0.00425529]
mean value: 0.008297348022460937
key: test_mcc
value: [ 0.57735027 0. -0.57735027 0. 0.73029674 0.73029674
-0.09128709 0.16666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.63636364 0.66943868 0.6969697 0.72760688 0.64254029 0.64171123
0.61229947 0.67172194 0.7059638 0.64171123]
mean value: 0.6646326847028793
key: test_accuracy
value: [0.75 0.5 0.25 0.5 0.85714286 0.85714286
0.42857143 0.57142857 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.81818182 0.83333333 0.84848485 0.86363636 0.82089552 0.82089552
0.80597015 0.8358209 0.85074627 0.82089552]
mean value: 0.8318860244233379
key: test_fscore
value: [0.66666667 0.5 0.4 0.5 0.8 0.8
0.5 0.57142857 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.81818182 0.84057971 0.84848485 0.86567164 0.82857143 0.82352941
0.80597015 0.83076923 0.83870968 0.81818182]
mean value: 0.8318649734562908
key: test_precision
value: [1. 0.5 0.33333333 0.5 1. 1.
0.4 0.66666667 0.8 nan]
mean value: nan
key: train_precision
value: [0.81818182 0.80555556 0.84848485 0.85294118 0.80555556 0.82352941
0.81818182 0.84375 0.89655172 0.81818182]
mean value: 0.8330913726514639
key: test_recall
value: [0.5 0.5 0.5 0.5 0.66666667 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [0.81818182 0.87878788 0.84848485 0.87878788 0.85294118 0.82352941
0.79411765 0.81818182 0.78787879 0.81818182]
mean value: 0.8319073083778966
key: test_roc_auc
value: [0.75 0.5 0.25 0.5 0.83333333 0.83333333
0.45833333 0.58333333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.81818182 0.83333333 0.84848485 0.86363636 0.82040998 0.82085561
0.80614973 0.8355615 0.84982175 0.82085561]
mean value: 0.8317290552584671
key: test_jcc
value: [0.5 0.33333333 0.25 0.33333333 0.66666667 0.66666667
0.33333333 0.4 0.8 nan]
mean value: nan
key: train_jcc
value: [0.69230769 0.725 0.73684211 0.76315789 0.70731707 0.7
0.675 0.71052632 0.72222222 0.69230769]
mean value: 0.7124680995797812
MCC on Blind test: 0.07
Accuracy on Blind test: 0.56
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00843787 0.00955176 0.00883889 0.00942826 0.0091989 0.00902915
0.00917506 0.0088551 0.0091939 0.00976706]
mean value: 0.00914759635925293
key: score_time
value: [0.01573682 0.01592684 0.01467657 0.0095439 0.00948215 0.00990367
0.01001978 0.00993037 0.01061368 0.00524116]
mean value: 0.011107492446899413
key: test_mcc
value: [ 0.57735027 0. -0.77459667 0. 0.16666667 0.16666667
0.16666667 0.41666667 0.16666667 nan]
mean value: nan
key: train_mcc
value: [0.4868645 0.54570516 0.54772256 0.5768179 0.61229947 0.46494686
0.64254029 0.52228164 0.46256684 0.58692605]
mean value: 0.5448671256211469
key: test_accuracy
value: [0.75 0.5 0.125 0.5 0.57142857 0.57142857
0.57142857 0.71428571 0.57142857 nan]
mean value: nan
key: train_accuracy
value: [0.74242424 0.77272727 0.77272727 0.78787879 0.80597015 0.73134328
0.82089552 0.76119403 0.73134328 0.79104478]
mean value: 0.7717548620533695
key: test_fscore
value: [0.66666667 0.5 0. 0.33333333 0.57142857 0.57142857
0.57142857 0.75 0.57142857 nan]
mean value: nan
key: train_fscore
value: [0.73015873 0.7761194 0.76190476 0.78125 0.80597015 0.75
0.82857143 0.75757576 0.72727273 0.8 ]
mean value: 0.7718822957722211
key: test_precision
value: [1. 0.5 0. 0.5 0.5 0.5
0.5 0.75 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.76666667 0.76470588 0.8 0.80645161 0.81818182 0.71052632
0.80555556 0.75757576 0.72727273 0.75675676]
mean value: 0.7713693093054923
key: test_recall
value: [0.5 0.5 0. 0.25 0.66666667 0.66666667
0.66666667 0.75 0.5 nan]
mean value: nan
key: train_recall
value: [0.6969697 0.78787879 0.72727273 0.75757576 0.79411765 0.79411765
0.85294118 0.75757576 0.72727273 0.84848485]
mean value: 0.7744206773618538
key: test_roc_auc
value: [0.75 0.5 0.125 0.5 0.58333333 0.58333333
0.58333333 0.70833333 0.58333333 nan]
mean value: nan
key: train_roc_auc
value: [0.74242424 0.77272727 0.77272727 0.78787879 0.80614973 0.73039216
0.82040998 0.76114082 0.73128342 0.79188948]
mean value: 0.7717023172905526
key: test_jcc
value: [0.5 0.33333333 0. 0.2 0.4 0.4
0.4 0.6 0.4 nan]
mean value: nan
key: train_jcc
value: [0.575 0.63414634 0.61538462 0.64102564 0.675 0.6
0.70731707 0.6097561 0.57142857 0.66666667]
mean value: 0.6295725006700617
MCC on Blind test: -0.03
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00905323 0.00905657 0.00877595 0.0094254 0.00974536 0.00978231
0.00884295 0.00941014 0.00875664 0.00880909]
mean value: 0.009165763854980469
key: score_time
value: [0.00940609 0.00956655 0.00896621 0.00944877 0.00946617 0.0087285
0.00870156 0.00895 0.00927949 0.00434017]
mean value: 0.00868535041809082
key: test_mcc
value: [ 0.77459667 0.77459667 -0.5 -0.25819889 -0.41666667 0.41666667
0.75 -0.16666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.78824078 0.85201287 0.8196886 0.8196886 0.73172906 0.79391828
0.82085561 0.8221656 0.79125812 0.82085561]
mean value: 0.8060413138271034
key: test_accuracy
value: [0.875 0.875 0.25 0.375 0.28571429 0.71428571
0.85714286 0.42857143 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.89393939 0.92424242 0.90909091 0.90909091 0.86567164 0.89552239
0.91044776 0.91044776 0.89552239 0.91044776]
mean value: 0.9024423337856173
key: test_fscore
value: [0.85714286 0.88888889 0.25 0.28571429 0.28571429 0.66666667
0.85714286 0.5 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.89230769 0.92063492 0.90625 0.90625 0.86567164 0.90140845
0.91176471 0.90625 0.89230769 0.90909091]
mean value: 0.9011936012718837
key: test_precision
value: [1. 0.8 0.25 0.33333333 0.25 0.66666667
0.75 0.5 0.8 nan]
mean value: nan
key: train_precision
value: [0.90625 0.96666667 0.93548387 0.93548387 0.87878788 0.86486486
0.91176471 0.93548387 0.90625 0.90909091]
mean value: 0.9150126638195898
key: test_recall
value: [0.75 1. 0.25 0.25 0.33333333 0.66666667
1. 0.5 1. nan]
mean value: nan
key: train_recall
value: [0.87878788 0.87878788 0.87878788 0.87878788 0.85294118 0.94117647
0.91176471 0.87878788 0.87878788 0.90909091]
mean value: 0.8887700534759359
key: test_roc_auc
value: [0.875 0.875 0.25 0.375 0.29166667 0.70833333
0.875 0.41666667 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.89393939 0.92424242 0.90909091 0.90909091 0.86586453 0.89483066
0.91042781 0.90998217 0.89527629 0.91042781]
mean value: 0.9023172905525847
key: test_jcc
value: [0.75 0.8 0.14285714 0.16666667 0.16666667 0.5
0.75 0.33333333 0.8 nan]
mean value: nan
key: train_jcc
value: [0.80555556 0.85294118 0.82857143 0.82857143 0.76315789 0.82051282
0.83783784 0.82857143 0.80555556 0.83333333]
mean value: 0.8204608459716819
MCC on Blind test: 0.1
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.48186159 0.47740698 0.36878991 0.37100196 0.40507483 0.37354064
0.47149277 0.47795129 0.4443264 0.30708814]
mean value: 0.41785345077514646
key: score_time
value: [0.01215649 0.01253653 0.0121944 0.01231432 0.01251912 0.01243162
0.01273942 0.0124898 0.01237965 0.00691366]
mean value: 0.011867499351501465
key: test_mcc
value: [ 1. -0.25819889 0. 0.25819889 0.47140452 0.73029674
0.16666667 -0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.375 0.5 0.625 0.71428571 0.85714286
0.57142857 0.42857143 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.28571429 0.6 0.66666667 0.5 0.8
0.57142857 0.5 0.8 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.33333333 0.5 0.6 1. 1.
0.5 0.5 0.66666667 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[1. 0.25 0.75 0.75 0.33333333 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.375 0.5 0.625 0.66666667 0.83333333
0.58333333 0.41666667 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.16666667 0.42857143 0.5 0.33333333 0.66666667
0.4 0.33333333 0.66666667 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.64
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01340342 0.01321054 0.01008272 0.01072669 0.01062202 0.01029515
0.01031184 0.01027846 0.00955749 0.00987911]
mean value: 0.01083674430847168
key: score_time
value: [0.01169109 0.00949645 0.00896549 0.0094192 0.00934792 0.00908232
0.00924945 0.00847816 0.0093143 0.0046885 ]
mean value: 0.008973288536071777
key: test_mcc
value: [0.77459667 1. 1. 0.57735027 1. 0.73029674
0.75 0.54772256 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 1. 1. 0.75 1. 0.85714286
0.85714286 0.71428571 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 1. 1. 0.66666667 1. 0.8
0.85714286 0.66666667 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1. 0.75 1. 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.5 1. 0.66666667
1. 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 1. 1. 0.75 1. 0.83333333
0.875 0.75 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 1. 1. 0.5 1. 0.66666667
0.75 0.5 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.88
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08548617 0.08263326 0.08414531 0.08557725 0.08452487 0.08568358
0.08547211 0.0901866 0.08010912 0.07997227]
mean value: 0.08437905311584473
key: score_time
value: [0.01851916 0.01860356 0.01802731 0.01822042 0.01861954 0.01823235
0.01772761 0.01915503 0.01731133 0.00448871]
mean value: 0.016890501976013182
key: test_mcc
value: [ 0.77459667 0.77459667 -0.25819889 0.25819889 0.09128709 0.41666667
0.54772256 -0.41666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.875 0.375 0.625 0.57142857 0.71428571
0.71428571 0.28571429 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.88888889 0.28571429 0.57142857 0.4 0.66666667
0.75 0.28571429 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.8 0.33333333 0.66666667 0.5 0.66666667
0.6 0.33333333 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.25 0.5 0.33333333 0.66666667
1. 0.25 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.875 0.375 0.625 0.54166667 0.70833333
0.75 0.29166667 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.8 0.16666667 0.4 0.25 0.5
0.6 0.16666667 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.67
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00870752 0.00884748 0.00870347 0.00850773 0.00849891 0.00838685
0.00841236 0.00849438 0.00951147 0.00876331]
mean value: 0.008683347702026367
key: score_time
value: [0.0090642 0.00924206 0.00885201 0.00862956 0.00854135 0.00870943
0.00896764 0.00864267 0.00927711 0.00418854]
mean value: 0.008411455154418945
key: test_mcc
value: [ 0.77459667 0. -0.57735027 0.5 0.41666667 -0.16666667
-0.41666667 0.73029674 0.75 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.5 0.25 0.75 0.71428571 0.42857143
0.28571429 0.85714286 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.33333333 0.4 0.75 0.66666667 0.33333333
0.28571429 0.88888889 0.85714286 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.5 0.33333333 0.75 0.66666667 0.33333333
0.25 0.8 1. nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.25 0.5 0.75 0.66666667 0.33333333
0.33333333 1. 0.75 nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.5 0.25 0.75 0.70833333 0.41666667
0.29166667 0.83333333 0.875 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.2 0.25 0.6 0.5 0.2
0.16666667 0.8 0.75 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
mean value: 1.0
MCC on Blind test: -0.02
Accuracy on Blind test: 0.51
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.04852128 1.03308892 1.01017737 1.01914382 1.03468871 1.0292933
0.99042964 1.03115463 0.99485707 0.99539757]
mean value: 1.0186752319335937
key: score_time
value: [0.09594202 0.09505582 0.09608316 0.09419131 0.08874226 0.08800244
0.08889222 0.08808494 0.08911848 0.0045104 ]
mean value: 0.08286230564117432
key: test_mcc
value: [0.77459667 0.77459667 0. 0.25819889 0.75 0.73029674
0.54772256 0.41666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.875 0.5 0.625 0.85714286 0.85714286
0.71428571 0.71428571 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.88888889 0.33333333 0.57142857 0.85714286 0.8
0.75 0.75 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 0.5 0.66666667 0.75 1.
0.6 0.75 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 0.25 0.5 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.875 0.5 0.625 0.875 0.83333333
0.75 0.70833333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.8 0.2 0.4 0.75 0.66666667
0.6 0.6 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.73
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.8218751 0.84969497 0.8178587 0.906358 0.86499405 0.90028644
0.82887864 0.83176374 0.86882186 0.90769243]
mean value: 0.8598223924636841
key: score_time
value: [0.2009244 0.22945976 0.20735073 0.19348812 0.22602344 0.18154192
0.23737407 0.25499034 0.20944715 0.00459886]
mean value: 0.19451987743377686
key: test_mcc
value: [0.77459667 0.77459667 0. 0. 1. 0.73029674
0.54772256 0.09128709 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.9701425 0.9701425 0.88040627 0.84887469 0.9107943 0.91087344
0.88057041 0.91087344 0.9402852 0.91087344]
mean value: 0.9133836194169118
key: test_accuracy
value: [0.875 0.875 0.5 0.5 1. 0.85714286
0.71428571 0.57142857 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.98484848 0.98484848 0.93939394 0.92424242 0.95522388 0.95522388
0.94029851 0.95522388 0.97014925 0.95522388]
mean value: 0.9564676616915423
key: test_fscore
value: [0.85714286 0.88888889 0.33333333 0.33333333 1. 0.8
0.75 0.66666667 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.98461538 0.98461538 0.9375 0.92307692 0.95652174 0.95522388
0.94117647 0.95522388 0.96969697 0.95522388]
mean value: 0.9562874513514377
key: test_precision
value: [1. 0.8 0.5 0.5 1. 1. 0.6 0.6 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 0.96774194 0.9375 0.94285714 0.96969697
0.94117647 0.94117647 0.96969697 0.94117647]
mean value: 0.9611022429499659
key: test_recall
value: [0.75 1. 0.25 0.25 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [0.96969697 0.96969697 0.90909091 0.90909091 0.97058824 0.94117647
0.94117647 0.96969697 0.96969697 0.96969697]
mean value: 0.9519607843137254
key: test_roc_auc
value: [0.875 0.875 0.5 0.5 1. 0.83333333
0.75 0.54166667 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.98484848 0.98484848 0.93939394 0.92424242 0.95499109 0.95543672
0.9402852 0.95543672 0.9701426 0.95543672]
mean value: 0.9565062388591801
key: test_jcc
value: [0.75 0.8 0.2 0.2 1. 0.66666667
0.6 0.5 0.8 nan]
mean value: nan
key: train_jcc
value: [0.96969697 0.96969697 0.88235294 0.85714286 0.91666667 0.91428571
0.88888889 0.91428571 0.94117647 0.91428571]
mean value: 0.91684789067142
MCC on Blind test: 0.2
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01189542 0.01049495 0.00895762 0.00860286 0.00866747 0.00842595
0.00850391 0.00848532 0.00846291 0.00847697]
mean value: 0.00909733772277832
key: score_time
value: [0.01070786 0.00954223 0.00891876 0.00899506 0.00864911 0.00857663
0.00854445 0.00854731 0.00858331 0.0042212 ]
mean value: 0.008528590202331543
key: test_mcc
value: [ 0.57735027 0. -0.57735027 0. 0.73029674 0.73029674
-0.09128709 0.16666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.63636364 0.66943868 0.6969697 0.72760688 0.64254029 0.64171123
0.61229947 0.67172194 0.7059638 0.64171123]
mean value: 0.6646326847028793
key: test_accuracy
value: [0.75 0.5 0.25 0.5 0.85714286 0.85714286
0.42857143 0.57142857 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.81818182 0.83333333 0.84848485 0.86363636 0.82089552 0.82089552
0.80597015 0.8358209 0.85074627 0.82089552]
mean value: 0.8318860244233379
key: test_fscore
value: [0.66666667 0.5 0.4 0.5 0.8 0.8
0.5 0.57142857 0.88888889 nan]
mean value: nan
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[0.81818182 0.84057971 0.84848485 0.86567164 0.82857143 0.82352941
0.80597015 0.83076923 0.83870968 0.81818182]
mean value: 0.8318649734562908
key: test_precision
value: [1. 0.5 0.33333333 0.5 1. 1.
0.4 0.66666667 0.8 nan]
mean value: nan
key: train_precision
value: [0.81818182 0.80555556 0.84848485 0.85294118 0.80555556 0.82352941
0.81818182 0.84375 0.89655172 0.81818182]
mean value: 0.8330913726514639
key: test_recall
value: [0.5 0.5 0.5 0.5 0.66666667 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [0.81818182 0.87878788 0.84848485 0.87878788 0.85294118 0.82352941
0.79411765 0.81818182 0.78787879 0.81818182]
mean value: 0.8319073083778966
key: test_roc_auc
value: [0.75 0.5 0.25 0.5 0.83333333 0.83333333
0.45833333 0.58333333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.81818182 0.83333333 0.84848485 0.86363636 0.82040998 0.82085561
0.80614973 0.8355615 0.84982175 0.82085561]
mean value: 0.8317290552584671
key: test_jcc
value: [0.5 0.33333333 0.25 0.33333333 0.66666667 0.66666667
0.33333333 0.4 0.8 nan]
mean value: nan
key: train_jcc
value: [0.69230769 0.725 0.73684211 0.76315789 0.70731707 0.7
0.675 0.71052632 0.72222222 0.69230769]
mean value: 0.7124680995797812
MCC on Blind test: 0.07
Accuracy on Blind test: 0.56
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.08550191 0.49211669 0.20335126 0.03310847 0.0349474 0.03457999
0.03765225 0.09568024 0.20947433 0.11704874]
mean value: 0.2343461275100708
key: score_time
value: [0.01274514 0.01130342 0.01107645 0.01041508 0.0103085 0.01025128
0.01089311 0.01567221 0.01348019 0.0052917 ]
mean value: 0.011143708229064941
key: test_mcc
value: [0.77459667 0.77459667 1. 0.57735027 1. 0.73029674
0.75 0.75 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.875 1. 0.75 1. 0.85714286
0.85714286 0.85714286 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.88888889 1. 0.66666667 1. 0.8
0.85714286 0.85714286 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 1. 1. 1. 1. 0.75 1. 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.5 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.875 1. 0.75 1. 0.83333333
0.875 0.875 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.8 1. 0.5 1. 0.66666667
0.75 0.75 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.87
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01452994 0.01688433 0.01669741 0.02562022 0.03764725 0.02424407
0.03305769 0.03621292 0.03531814 0.03604174]
mean value: 0.027625370025634765
key: score_time
value: [0.01220036 0.01215482 0.01213193 0.02140093 0.02040553 0.02412796
0.02286553 0.02402568 0.02254677 0.01251054]
mean value: 0.018437004089355467
key: test_mcc
value: [ 0.25819889 -0.77459667 0.77459667 0.5 0.75 0.73029674
0.16666667 0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [0.93939394 0.9701425 0.9701425 0.9701425 1. 0.97058824
0.9402852 1. 0.97056238 1. ]
mean value: 0.9731257264504388
key: test_accuracy
value: [0.625 0.125 0.875 0.75 0.85714286 0.85714286
0.57142857 0.57142857 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [0.96969697 0.98484848 0.98484848 0.98484848 1. 0.98507463
0.97014925 1. 0.98507463 1. ]
mean value: 0.986454093170511
key: test_fscore
value: [0.66666667 0.22222222 0.88888889 0.75 0.85714286 0.8
0.57142857 0.57142857 0.8 nan]
mean value: nan
key: train_fscore
value: [0.96969697 0.98461538 0.98461538 0.98507463 1. 0.98507463
0.97058824 1. 0.98461538 1. ]
mean value: 0.9864280612568584
key: test_precision
value: [0.6 0.2 0.8 0.75 0.75 1.
0.5 0.66666667 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.96969697 1. 1. 0.97058824 1. 1.
0.97058824 1. 1. 1. ]
mean value: 0.9910873440285205
key: test_recall
value: [0.75 0.25 1. 0.75 1. 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [0.96969697 0.96969697 0.96969697 1. 1. 0.97058824
0.97058824 1. 0.96969697 1. ]
mean value: 0.9819964349376115
key: test_roc_auc
value: [0.625 0.125 0.875 0.75 0.875 0.83333333
0.58333333 0.58333333 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [0.96969697 0.98484848 0.98484848 0.98484848 1. 0.98529412
0.9701426 1. 0.98484848 1. ]
mean value: 0.9864527629233512
key: test_jcc
value: [0.5 0.125 0.8 0.6 0.75 0.66666667
0.4 0.4 0.66666667 nan]
mean value: nan
key: train_jcc
value: [0.94117647 0.96969697 0.96969697 0.97058824 1. 0.97058824
0.94285714 1. 0.96969697 1. ]
mean value: 0.9734300993124523
MCC on Blind test: 0.03
Accuracy on Blind test: 0.5
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01807809 0.00909543 0.00902748 0.00905871 0.00937676 0.00846314
0.00849557 0.00899744 0.00889373 0.00911236]
mean value: 0.009859871864318848
key: score_time
value: [0.01044512 0.00886965 0.00870347 0.00933313 0.00879788 0.00953507
0.0093205 0.00855994 0.00924015 0.0042541 ]
mean value: 0.008705902099609374
key: test_mcc
value: [ 0.5 0.77459667 -0.5 -0.5 0.09128709 0.16666667
0.16666667 0.41666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.45538256 0.51610023 0.48507125 0.45454545 0.43315508 0.43264959
0.43315508 0.46387173 0.40285205 0.37533647]
mean value: 0.4452119489771236
key: test_accuracy
value: [0.75 0.875 0.25 0.25 0.57142857 0.57142857
0.57142857 0.71428571 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.72727273 0.75757576 0.74242424 0.72727273 0.71641791 0.71641791
0.71641791 0.73134328 0.70149254 0.68656716]
mean value: 0.7223202170963365
key: test_fscore
value: [0.75 0.88888889 0.25 0.25 0.4 0.57142857
0.57142857 0.75 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.71875 0.75 0.74626866 0.72727273 0.71641791 0.72463768
0.71641791 0.73529412 0.6969697 0.69565217]
mean value: 0.7227680874573887
key: test_precision
value: [0.75 0.8 0.25 0.25 0.5 0.5 0.5 0.75 0.8 nan]
mean value: nan
key: train_precision
value: [0.74193548 0.77419355 0.73529412 0.72727273 0.72727273 0.71428571
0.72727273 0.71428571 0.6969697 0.66666667]
mean value: 0.7225449123931098
key: test_recall
value: [0.75 1. 0.25 0.25 0.33333333 0.66666667
0.66666667 0.75 1. nan]
mean value: nan
key: train_recall
value: [0.6969697 0.72727273 0.75757576 0.72727273 0.70588235 0.73529412
0.70588235 0.75757576 0.6969697 0.72727273]
mean value: 0.7237967914438503
key: test_roc_auc
value: [0.75 0.875 0.25 0.25 0.54166667 0.58333333
0.58333333 0.70833333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.72727273 0.75757576 0.74242424 0.72727273 0.71657754 0.71613191
0.71657754 0.73172906 0.70142602 0.68716578]
mean value: 0.7224153297682709
key: test_jcc
value: [0.6 0.8 0.14285714 0.14285714 0.25 0.4
0.4 0.6 0.8 nan]
mean value: nan
key: train_jcc
value: [0.56097561 0.6 0.5952381 0.57142857 0.55813953 0.56818182
0.55813953 0.58139535 0.53488372 0.53333333]
mean value: 0.5661715567472799
MCC on Blind test: 0.12
Accuracy on Blind test: 0.64
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01039577 0.01272821 0.01253057 0.01269507 0.01259279 0.01320624
0.01454878 0.01423931 0.03339195 0.01325393]
mean value: 0.01495826244354248
key: score_time
value: [0.00860286 0.01140451 0.01143074 0.01141548 0.01142693 0.01305008
0.01304579 0.01313567 0.01181912 0.00594831]
mean value: 0.011127948760986328
key: test_mcc
value: [ 1. -0.25819889 -0.37796447 0. 0.16666667 1.
0.47140452 0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [0.78086881 0.80622577 0.91287093 0.9701425 0.70972086 0.73387158
0.76008624 0.94191581 0.79144385 0.94191581]
mean value: 0.834906215901378
key: test_accuracy
value: [1. 0.375 0.375 0.5 0.57142857 1.
0.71428571 0.57142857 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [0.87878788 0.89393939 0.95454545 0.98484848 0.8358209 0.85074627
0.86567164 0.97014925 0.89552239 0.97014925]
mean value: 0.9100180913613749
key: test_fscore
value: [1. 0.28571429 0.54545455 0.5 0.57142857 1.
0.5 0.57142857 0.8 nan]
mean value: nan
key: train_fscore
value: [0.89189189 0.90410959 0.95652174 0.98507463 0.86075949 0.87179487
0.84745763 0.96875 0.89552239 0.96875 ]
mean value: 0.9150632227573198
key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[1. 0.33333333 0.42857143 0.5 0.5 1.
1. 0.66666667 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.80487805 0.825 0.91666667 0.97058824 0.75555556 0.77272727
1. 1. 0.88235294 1. ]
mean value: 0.8927768720200571
key: test_recall
value: [1. 0.25 0.75 0.5 0.66666667 1.
0.33333333 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1.
0.73529412 0.93939394 0.90909091 0.93939394]
mean value: 0.9523172905525847
key: test_roc_auc
value: [1. 0.375 0.375 0.5 0.58333333 1.
0.66666667 0.58333333 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [0.87878788 0.89393939 0.95454545 0.98484848 0.83333333 0.84848485
0.86764706 0.96969697 0.89572193 0.96969697]
mean value: 0.9096702317290553
key: test_jcc
value: [1. 0.16666667 0.375 0.33333333 0.4 1.
0.33333333 0.4 0.66666667 nan]
mean value: nan
key: train_jcc
value: [0.80487805 0.825 0.91666667 0.97058824 0.75555556 0.77272727
0.73529412 0.93939394 0.81081081 0.93939394]
mean value: 0.8470308586269849
MCC on Blind test: 0.16
Accuracy on Blind test: 0.77
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01237702 0.01218629 0.01207566 0.01200914 0.01205039 0.01229119
0.01242113 0.01218367 0.0121491 0.01185155]
mean value: 0.012159514427185058
key: score_time
value: [0.01146197 0.01142049 0.01146102 0.01147199 0.01162982 0.01166487
0.01148105 0.01143789 0.01141834 0.00575423]
mean value: 0.010920166969299316
key: test_mcc
value: [0.77459667 0.25819889 0. 0.57735027 0. 0.73029674
0.09128709 0.41666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [0.54232614 0.94112395 0.84887469 0.70710678 0.86064631 0.9142124
0.97058824 0.8871096 0.79446219 0.80941155]
mean value: 0.8275861849565603
key: test_accuracy
value: [0.875 0.625 0.5 0.75 0.57142857 0.85714286
0.57142857 0.71428571 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [0.72727273 0.96969697 0.92424242 0.83333333 0.92537313 0.95522388
0.98507463 0.94029851 0.89552239 0.89552239]
mean value: 0.9051560379918588
key: test_fscore
value: [0.88888889 0.66666667 0.6 0.66666667 0. 0.8
0.4 0.75 0.8 nan]
mean value: nan
key: train_fscore
value: [0.78571429 0.97058824 0.92537313 0.8 0.92063492 0.95384615
0.98507463 0.94285714 0.89855072 0.90410959]
mean value: 0.9086748813219427
key: test_precision
value: [0.8 0.6 0.5 1. 0. 1.
0.5 0.75 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.64705882 0.94285714 0.91176471 1. 1. 1.
1. 0.89189189 0.86111111 0.825 ]
mean value: 0.907968367527191
key: test_recall
value: [1. 0.75 0.75 0.5 0. 0.66666667
0.33333333 0.75 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 0.93939394 0.66666667 0.85294118 0.91176471
0.97058824 1. 0.93939394 1. ]
mean value: 0.9280748663101605
key: test_roc_auc
value: [0.875 0.625 0.5 0.75 0.5 0.83333333
0.54166667 0.70833333 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [0.72727273 0.96969697 0.92424242 0.83333333 0.92647059 0.95588235
0.98529412 0.94117647 0.89616756 0.89705882]
mean value: 0.9056595365418895
key: test_jcc
value: [0.8 0.5 0.42857143 0.5 0. 0.66666667
0.25 0.6 0.66666667 nan]
mean value: nan
key: train_jcc
value: [0.64705882 0.94285714 0.86111111 0.66666667 0.85294118 0.91176471
0.97058824 0.89189189 0.81578947 0.825 ]
mean value: 0.8385669227387493
MCC on Blind test: 0.1
Accuracy on Blind test: 0.78
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.09126067 0.08087921 0.07952952 0.08191967 0.08121467 0.0825758
0.08135796 0.08136106 0.08138061 0.08200526]
mean value: 0.08234844207763672
key: score_time
value: [0.01517344 0.01841116 0.01596832 0.01609087 0.01606083 0.01617146
0.01605701 0.01604176 0.01604748 0.0049386 ]
mean value: 0.015096092224121093
key: test_mcc
value: [0.77459667 0.77459667 0.5 0.25819889 1. 0.73029674
0.75 0.75 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.875 0.75 0.625 1. 0.85714286
0.85714286 0.85714286 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.88888889 0.75 0.57142857 1. 0.8
0.85714286 0.85714286 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 0.75 0.66666667 1. 1.
0.75 1. 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 0.75 0.5 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.875 0.75 0.625 1. 0.83333333
0.875 0.875 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.8 0.6 0.4 1. 0.66666667
0.75 0.75 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.51
Accuracy on Blind test: 0.89
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03245139 0.02849269 0.03114295 0.04029226 0.03773594 0.0297606
0.04828548 0.04907727 0.04006147 0.04558992]
mean value: 0.038288998603820804
key: score_time
value: [0.0239048 0.01779222 0.03716946 0.03694153 0.02155066 0.02801538
0.03684258 0.03330183 0.03577065 0.00456214]
mean value: 0.02758512496948242
key: test_mcc
value: [0.77459667 0.77459667 1. 0.57735027 1. 0.73029674
0.75 0.75 0.73029674 nan]
mean value: nan
key: train_mcc
value: [0.93939394 1. 1. 0.9701425 1. 0.9402852
1. 0.97058824 0.97056238 1. ]
mean value: 0.9790972264213724
key: test_accuracy
value: [0.875 0.875 1. 0.75 1. 0.85714286
0.85714286 0.85714286 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [0.96969697 1. 1. 0.98484848 1. 0.97014925
1. 0.98507463 0.98507463 1. ]
mean value: 0.9894843962008141
key: test_fscore
value: [0.85714286 0.88888889 1. 0.66666667 1. 0.8
0.85714286 0.85714286 0.88888889 nan]
mean value: nan
key: train_fscore
value: [0.96969697 1. 1. 0.98461538 1. 0.97058824
1. 0.98507463 0.98461538 1. ]
mean value: 0.9894590601087528
key: test_precision
value: [1. 0.8 1. 1. 1. 1. 0.75 1. 0.8 nan]
mean value: nan
key: train_precision
value: [0.96969697 1. 1. 1. 1. 0.97058824
1. 0.97058824 1. 1. ]
mean value: 0.9910873440285205
key: test_recall
value: [0.75 1. 1. 0.5 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [0.96969697 1. 1. 0.96969697 1. 0.97058824
1. 1. 0.96969697 1. ]
mean value: 0.9879679144385027
key: test_roc_auc
value: [0.875 0.875 1. 0.75 1. 0.83333333
0.875 0.875 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [0.96969697 1. 1. 0.98484848 1. 0.9701426
1. 0.98529412 0.98484848 1. ]
mean value: 0.9894830659536542
key: test_jcc
value: [0.75 0.8 1. 0.5 1. 0.66666667
0.75 0.75 0.8 nan]
mean value: nan
key: train_jcc
value: [0.94117647 1. 1. 0.96969697 1. 0.94285714
1. 0.97058824 0.96969697 1. ]
mean value: 0.9794015788133436
MCC on Blind test: 0.53
Accuracy on Blind test: 0.9
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.01175404 0.01408124 0.01423097 0.01489067 0.01489496 0.01492834
0.01666331 0.02286315 0.01500893 0.01537228]
mean value: 0.015468788146972657
key: score_time
value: [0.01120687 0.01108646 0.01164126 0.01179743 0.01176 0.0117147
0.01189256 0.01203537 0.01175928 0.00604916]
mean value: 0.011094307899475098
key: test_mcc
value: [ 1. 0.25819889 -0.25819889 0.25819889 -0.16666667 0.16666667
0.16666667 0.16666667 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.625 0.375 0.625 0.42857143 0.57142857
0.57142857 0.57142857 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.66666667 0.44444444 0.57142857 0.33333333 0.57142857
0.57142857 0.57142857 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.6 0.4 0.66666667 0.33333333 0.5
0.5 0.66666667 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
[1. 0.75 0.5 0.5 0.33333333 0.66666667
0.66666667 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.625 0.375 0.625 0.41666667 0.58333333
0.58333333 0.58333333 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.5 0.28571429 0.4 0.2 0.4
0.4 0.4 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.14
Accuracy on Blind test: 0.5
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.15756583 0.14230919 0.14231634 0.14149928 0.15154886 0.15950918
0.12302613 0.15684557 0.12467861 0.1421895 ]
mean value: 0.144148850440979
key: score_time
value: [0.00912619 0.00896001 0.0089817 0.00898099 0.00998139 0.00900364
0.00891662 0.00905848 0.0092051 0.00442433]
mean value: 0.00866384506225586
key: test_mcc
value: [1. 0.77459667 1. 0.57735027 1. 0.73029674
0.75 0.75 0.73029674 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.875 1. 0.75 1. 0.85714286
0.85714286 0.85714286 0.85714286 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.88888889 1. 0.66666667 1. 0.8
0.85714286 0.85714286 0.88888889 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 1. 1. 1. 1. 0.75 1. 0.8 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.5 1. 0.66666667
1. 0.75 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.875 1. 0.75 1. 0.83333333
0.875 0.875 0.83333333 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.8 1. 0.5 1. 0.66666667
0.75 0.75 0.8 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.86
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00932288 0.01288772 0.01312137 0.01292324 0.01315093 0.01315165
0.013165 0.01327205 0.01327038 0.01341057]
mean value: 0.012767577171325683
key: score_time
value: [0.00868034 0.01161909 0.01167655 0.01166964 0.01424122 0.01169658
0.01479149 0.01418686 0.01179957 0.00896955]
mean value: 0.011933088302612305
key: test_mcc
value: [ 0.57735027 0.57735027 0. -0.57735027 0.16666667 0.41666667
-0.75 0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.75 0.5 0.25 0.57142857 0.71428571
0.14285714 0.57142857 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.8 0.33333333 0. 0.57142857 0.66666667
0. 0.57142857 0.8 nan]
mean value: nan
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.66666667 0.5 0. 0.5 0.66666667
0. 0.66666667 0.66666667 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 1. 0.25 0. 0.66666667 0.66666667
0. 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.75 0.5 0.25 0.58333333 0.70833333
0.125 0.58333333 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.66666667 0.2 0. 0.4 0.5
0. 0.4 0.66666667 nan]
mean value: nan
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.15
Accuracy on Blind test: 0.48
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02941251 0.01240587 0.01239109 0.01243472 0.01241302 0.02361703
0.01239061 0.0124073 0.03145409 0.03129125]
mean value: 0.01902174949645996
key: score_time
value: [0.02128053 0.01157498 0.01154494 0.01151633 0.01154399 0.01161838
0.0115726 0.0115397 0.02226758 0.01235414]
mean value: 0.013681316375732422
key: test_mcc
value: [ 0.77459667 0. 0. 0. 0.47140452 0.73029674
0.47140452 -0.16666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [1. 1. 0.9701425 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.9940730735439449
key: test_accuracy
value: [0.875 0.5 0.5 0.5 0.71428571 0.85714286
0.71428571 0.42857143 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [1. 1. 0.98484848 1. 1. 0.98507463
1. 1. 1. 1. ]
mean value: 0.9969923111714156
key: test_fscore
value: [0.85714286 0.5 0.6 0.5 0.5 0.8
0.5 0.5 0.8 nan]
mean value: nan
key: train_fscore
value: [1. 1. 0.98461538 1. 1. 0.98507463
1. 1. 1. 1. ]
mean value: 0.9969690011481056
key: test_precision
value: [1. 0.5 0.5 0.5 1. 1.
1. 0.5 0.66666667 nan]
mean value: nan
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.75 0.5 0.33333333 0.66666667
0.33333333 0.5 1. nan]
mean value: nan
key: train_recall
value: [1. 1. 0.96969697 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.9940285204991087
key: test_roc_auc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 74, in _cached_call
return cache[method]
KeyError: 'predict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 76, in _cached_call
result = getattr(estimator, method)(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['XDR'] in column 5 during transform
warnings.warn(
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.875 0.5 0.5 0.5 0.66666667 0.83333333
0.66666667 0.41666667 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [1. 1. 0.98484848 1. 1. 0.98529412
1. 1. 1. 1. ]
mean value: 0.9970142602495544
key: test_jcc
value: [0.75 0.33333333 0.42857143 0.33333333 0.33333333 0.66666667
0.33333333 0.33333333 0.66666667 nan]
mean value: nan
key: train_jcc
value: [1. 1. 0.96969697 1. 1. 0.97058824
1. 1. 1. 1. ]
mean value: 0.9940285204991087
MCC on Blind test: 0.16
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.11888051 0.11613035 0.13918924 0.17764306 0.11287332 0.13386416
0.11478901 0.11810565 0.11326432 0.11325955]
mean value: 0.12579991817474365
key: score_time
value: [0.01197052 0.02146673 0.02262592 0.01841044 0.02380562 0.01616836
0.020262 0.02225089 0.02016735 0.01243377]
mean value: 0.01895616054534912
key: test_mcc
value: [ 0.77459667 0. -0.37796447 0. -0.54772256 0.73029674
0.75 -0.41666667 0.47140452 nan]
mean value: nan
key: train_mcc
value: [0.75897093 1. 0.79115481 0.88040627 0.73172906 0.82248026
0.76271217 1. 1. 1. ]
mean value: 0.8747453484252882
key: test_accuracy
value: [0.875 0.5 0.375 0.5 0.28571429 0.85714286
0.85714286 0.28571429 0.71428571 nan]
mean value: nan
key: train_accuracy
value: [0.87878788 1. 0.89393939 0.93939394 0.86567164 0.91044776
0.88059701 1. 1. 1. ]
mean value: 0.936883763003166
key: test_fscore
value: [0.85714286 0.5 0.54545455 0.5 0. 0.8
0.85714286 0.28571429 0.8 nan]
mean value: nan
key: train_fscore
value: [0.875 1. 0.88888889 0.9375 0.86567164 0.90909091
0.87878788 1. 1. 1. ]
mean value: 0.9354939318558722
key: test_precision
value: [1. 0.5 0.42857143 0.5 0. 1.
0.75 0.33333333 0.66666667 nan]
mean value: nan
key: train_precision
value: [0.90322581 1. 0.93333333 0.96774194 0.87878788 0.9375
0.90625 1. 1. 1. ]
mean value: 0.9526838954056696
key: test_recall
value: [0.75 0.5 0.75 0.5 0. 0.66666667
1. 0.25 1. nan]
mean value: nan
key: train_recall
value: [0.84848485 1. 0.84848485 0.90909091 0.85294118 0.88235294
0.85294118 1. 1. 1. ]
mean value: 0.9194295900178253
key: test_roc_auc
value: [0.875 0.5 0.375 0.5 0.25 0.83333333
0.875 0.29166667 0.66666667 nan]
mean value: nan
key: train_roc_auc
value: [0.87878788 1. 0.89393939 0.93939394 0.86586453 0.91087344
0.88101604 1. 1. 1. ]
mean value: 0.93698752228164
key: test_jcc
value: [0.75 0.33333333 0.375 0.33333333 0. 0.66666667
0.75 0.16666667 0.66666667 nan]
mean value: nan
key: train_jcc
value: [0.77777778 1. 0.8 0.88235294 0.76315789 0.83333333
0.78378378 1. 1. 1. ]
mean value: 0.8840405730808207
MCC on Blind test: 0.15
Accuracy on Blind test: 0.69
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04169297 0.04000759 0.0398705 0.0397861 0.03995085 0.04071259
0.04140377 0.04145694 0.04049969 0.06114721]
mean value: 0.04265282154083252
key: score_time
value: [0.01223469 0.01473475 0.01499009 0.0151639 0.0152235 0.01410294
0.01532245 0.01532817 0.01256084 0.02204752]
mean value: 0.015170884132385255
key: test_mcc
value: [0.74456944 0.64102564 0.70064905 0.77563153 0.79675455 0.77311567
0.83239263 0.84516739 0.76637425 0.89608637]
mean value: 0.7771766522016534
key: train_mcc
value: [0.82924493 0.80867746 0.84652087 0.84379896 0.84068353 0.871224
0.83522323 0.83322536 0.85282785 0.83557192]
mean value: 0.8396998114811894
key: test_accuracy
value: [0.87179487 0.82051282 0.84615385 0.88461538 0.8961039 0.88311688
0.90909091 0.92207792 0.88311688 0.94805195]
mean value: 0.8864635364635365
key: train_accuracy
value: [0.9137931 0.90373563 0.92241379 0.92097701 0.91965567 0.93400287
0.91678623 0.91535151 0.92539455 0.91678623]
mean value: 0.9188896584706212
key: test_fscore
value: [0.875 0.82051282 0.85714286 0.89156627 0.9 0.88888889
0.91566265 0.925 0.88607595 0.94871795]
mean value: 0.8908567380292254
key: train_fscore
value: [0.91643454 0.90629371 0.92479109 0.92350487 0.92200557 0.93681319
0.91944444 0.91839557 0.92777778 0.91944444]
mean value: 0.9214905199414426
key: test_precision
value: [0.85365854 0.82051282 0.8 0.84090909 0.85714286 0.8372093
0.84444444 0.90243902 0.875 0.94871795]
mean value: 0.8580034025028352
key: train_precision
value: [0.88918919 0.88283379 0.8972973 0.89487871 0.89701897 0.89973615
0.89218329 0.88533333 0.89784946 0.88978495]
mean value: 0.8926105128444033
key: test_recall
value: [0.8974359 0.82051282 0.92307692 0.94871795 0.94736842 0.94736842
1. 0.94871795 0.8974359 0.94871795]
mean value: 0.9279352226720647
key: train_recall
value: [0.9454023 0.93103448 0.95402299 0.95402299 0.94842407 0.97707736
0.94842407 0.95402299 0.95977011 0.95114943]
mean value: 0.9523350788788987
key: test_roc_auc
value: [0.87179487 0.82051282 0.84615385 0.88461538 0.89676113 0.88394062
0.91025641 0.9217274 0.88292848 0.94804318]
mean value: 0.8866734143049932
key: train_roc_auc
value: [0.9137931 0.90373563 0.92241379 0.92097701 0.91961433 0.93394098
0.91674077 0.91540691 0.9254438 0.91683546]
mean value: 0.9188901788360834
key: test_jcc
value: [0.77777778 0.69565217 0.75 0.80434783 0.81818182 0.8
0.84444444 0.86046512 0.79545455 0.90243902]
mean value: 0.8048762726527899
key: train_jcc
value: [0.84575835 0.8286445 0.86010363 0.85788114 0.85529716 0.88113695
0.85089974 0.84910486 0.86528497 0.85089974]
mean value: 0.8545011047745082
MCC on Blind test: 0.04
Accuracy on Blind test: 0.86
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.05394197 0.92688775 1.04308367 0.9212997 1.15521002 1.32852483
1.44046068 1.40850043 1.3699491 1.35082221]
mean value: 1.1998680353164672
key: score_time
value: [0.01867509 0.01617694 0.0124402 0.01548862 0.01599932 0.01576352
0.01242352 0.01645255 0.01575446 0.01281857]
mean value: 0.01519927978515625
key: test_mcc
value: [0.87904907 0.84615385 0.79056942 0.87904907 0.85485041 0.92495119
0.92495119 0.92480439 0.94929201 0.83165353]
mean value: 0.8805324141891033
key: train_mcc
value: [0.98573393 0.95427518 0.98290472 0.98857105 0.97449907 0.98575394
0.97449907 0.98011218 0.96614121 0.9914285 ]
mean value: 0.9783918847169994
key: test_accuracy
value: [0.93589744 0.92307692 0.88461538 0.93589744 0.92207792 0.96103896
0.96103896 0.96103896 0.97402597 0.90909091]
mean value: 0.9367798867798868
key: train_accuracy
value: [0.99281609 0.97701149 0.99137931 0.99425287 0.98708752 0.9928264
0.98708752 0.98995696 0.98278336 0.99569584]
mean value: 0.9890897359784957
key: test_fscore
value: [0.93975904 0.92307692 0.89655172 0.93975904 0.92682927 0.96202532
0.96202532 0.96296296 0.975 0.91764706]
mean value: 0.9405636642494579
key: train_fscore
value: [0.99286733 0.97727273 0.99145299 0.99428571 0.98727016 0.99288762
0.98727016 0.99004267 0.98305085 0.99570815]
mean value: 0.9892108377251556
key: test_precision
value: [0.88636364 0.92307692 0.8125 0.88636364 0.86363636 0.92682927
0.92682927 0.92857143 0.95121951 0.84782609]
mean value: 0.8953216123748997
key: train_precision
value: [0.98583569 0.96629213 0.98305085 0.98863636 0.97486034 0.98587571
0.97486034 0.98028169 0.96666667 0.99145299]
mean value: 0.9797812764842697
key: test_recall
value: [1. 0.92307692 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9923076923076923
key: train_recall
value: [1. 0.98850575 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9988505747126437
key: test_roc_auc
value: [0.93589744 0.92307692 0.88461538 0.93589744 0.92307692 0.96153846
0.96153846 0.96052632 0.97368421 0.90789474]
mean value: 0.936774628879892
key: train_roc_auc
value: [0.99281609 0.97701149 0.99137931 0.99425287 0.98706897 0.99281609
0.98706897 0.98997135 0.98280802 0.99570201]
mean value: 0.9890895168461614
key: test_jcc
value: [0.88636364 0.85714286 0.8125 0.88636364 0.86363636 0.92682927
0.92682927 0.92857143 0.95121951 0.84782609]
mean value: 0.8887282057814931
key: train_jcc
value: [0.98583569 0.95555556 0.98305085 0.98863636 0.97486034 0.98587571
0.97486034 0.98028169 0.96666667 0.99145299]
mean value: 0.9787076185566792
MCC on Blind test: 0.04
Accuracy on Blind test: 0.86
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01600218 0.01189256 0.01803446 0.0116744 0.01118493 0.01114821
0.01104283 0.01225209 0.01085114 0.01087952]
mean value: 0.012496232986450195
key: score_time
value: [0.01254511 0.01536846 0.01351309 0.00984836 0.00962949 0.00952077
0.00944638 0.00934315 0.00925326 0.00964808]
mean value: 0.010811614990234374
key: test_mcc
value: [0.18257419 0.38100038 0.14451833 0.11111111 0.42246381 0.08294514
0.37749121 0.32063015 0.3388424 0.15376882]
mean value: 0.251534553168588
key: train_mcc
value: [0.27273302 0.28980628 0.25196702 0.29596822 0.28701158 0.25820959
0.31875549 0.29834828 0.29987112 0.25917834]
mean value: 0.2831848946185855
key: test_accuracy
value: [0.57692308 0.65384615 0.56410256 0.55128205 0.67532468 0.53246753
0.64935065 0.64935065 0.66233766 0.57142857]
mean value: 0.6086413586413586
key: train_accuracy
value: [0.62068966 0.62643678 0.61350575 0.63074713 0.6241033 0.6169297
0.63414634 0.62697274 0.62840746 0.6169297 ]
mean value: 0.6238868549943106
key: test_fscore
value: [0.66666667 0.73267327 0.64583333 0.62365591 0.74226804 0.625
0.72727273 0.70967742 0.71111111 0.65263158]
mean value: 0.6836790059228386
key: train_fscore
value: [0.69230769 0.69976905 0.6824085 0.7008149 0.70022883 0.68476978
0.71251409 0.70387244 0.704 0.68402367]
mean value: 0.6964708954137468
key: test_precision
value: [0.55 0.59677419 0.54385965 0.53703704 0.61016949 0.51724138
0.59016393 0.61111111 0.62745098 0.55357143]
mean value: 0.5737379205044926
key: train_precision
value: [0.58235294 0.58494208 0.57915832 0.5890411 0.58285714 0.58232932
0.58736059 0.58301887 0.58444023 0.58148893]
mean value: 0.5836989522794114
key: test_recall
value: [0.84615385 0.94871795 0.79487179 0.74358974 0.94736842 0.78947368
0.94736842 0.84615385 0.82051282 0.79487179]
mean value: 0.8479082321187584
key: train_recall
value: [0.85344828 0.87068966 0.83045977 0.86494253 0.87679083 0.83094556
0.90544413 0.88793103 0.88505747 0.83045977]
mean value: 0.8636169021506439
key: test_roc_auc
value: [0.57692308 0.65384615 0.56410256 0.55128205 0.67881242 0.53576248
0.65317139 0.64676113 0.66025641 0.56848853]
mean value: 0.608940620782726
key: train_roc_auc
value: [0.62068966 0.62643678 0.61350575 0.63074713 0.62374024 0.6166222
0.63375655 0.62734661 0.62877515 0.61723562]
mean value: 0.6238855679610051
key: test_jcc
value: [0.5 0.578125 0.47692308 0.453125 0.59016393 0.45454545
0.57142857 0.55 0.55172414 0.484375 ]
mean value: 0.5210410175254367
key: train_jcc
value: [0.52941176 0.53818828 0.51792115 0.53942652 0.53873239 0.52064632
0.55341506 0.543058 0.54320988 0.51978417]
mean value: 0.5343793532965243
MCC on Blind test: 0.13
Accuracy on Blind test: 0.48
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01112032 0.01114225 0.01133752 0.01122379 0.01127672 0.01164079
0.01131344 0.01204729 0.01213574 0.01128817]
mean value: 0.011452603340148925
key: score_time
value: [0.00929403 0.00936055 0.00939679 0.00924015 0.00932431 0.00934219
0.00925708 0.00979233 0.00924635 0.00935888]
mean value: 0.00936126708984375
key: test_mcc
value: [0.36490022 0.30779351 0.26064302 0.33377251 0.66463964 0.43492389
0.48781832 0.6148924 0.45442279 0.48977837]
mean value: 0.44135846672835566
key: train_mcc
value: [0.48087064 0.50789897 0.47352983 0.47930458 0.47608902 0.48663171
0.52620857 0.42823335 0.52061948 0.52122959]
mean value: 0.49006157361636393
key: test_accuracy
value: [0.67948718 0.65384615 0.62820513 0.66666667 0.83116883 0.71428571
0.71428571 0.80519481 0.72727273 0.74025974]
mean value: 0.7160672660672661
key: train_accuracy
value: [0.73850575 0.75287356 0.73563218 0.73850575 0.73601148 0.74175036
0.758967 0.71305595 0.75609756 0.75753228]
mean value: 0.7428931875525652
key: test_fscore
value: [0.70588235 0.64935065 0.65882353 0.65789474 0.83544304 0.73170732
0.76595745 0.81927711 0.73417722 0.76744186]
mean value: 0.7325955254490786
key: train_fscore
value: [0.75405405 0.76373626 0.74794521 0.75068493 0.75268817 0.75609756
0.77894737 0.7260274 0.77572559 0.77436582]
mean value: 0.7580272368238905
key: test_precision
value: [0.65217391 0.65789474 0.60869565 0.67567568 0.80487805 0.68181818
0.64285714 0.77272727 0.725 0.70212766]
mean value: 0.6923848283492726
key: train_precision
value: [0.71173469 0.73157895 0.71465969 0.71727749 0.70886076 0.71722365
0.72019465 0.69371728 0.71707317 0.72319202]
mean value: 0.7155512339270805
key: test_recall
value: [0.76923077 0.64102564 0.71794872 0.64102564 0.86842105 0.78947368
0.94736842 0.87179487 0.74358974 0.84615385]
mean value: 0.7836032388663968
key: train_recall
value: [0.80172414 0.79885057 0.78448276 0.78735632 0.80229226 0.79942693
0.84813754 0.76149425 0.84482759 0.83333333]
mean value: 0.8061925699041597
key: test_roc_auc
value: [0.67948718 0.65384615 0.62820513 0.66666667 0.83164642 0.71524966
0.71727395 0.80431849 0.72705803 0.7388664 ]
mean value: 0.7162618083670715
key: train_roc_auc
value: [0.73850575 0.75287356 0.73563218 0.73850575 0.73591625 0.74166749
0.75883888 0.71312535 0.75622468 0.75764088]
mean value: 0.7428930771004183
key: test_jcc
value: [0.54545455 0.48076923 0.49122807 0.49019608 0.7173913 0.57692308
0.62068966 0.69387755 0.58 0.62264151]
mean value: 0.5819171021728274
key: train_jcc
value: [0.60520607 0.61777778 0.59737418 0.60087719 0.60344828 0.60784314
0.63793103 0.56989247 0.63362069 0.63180828]
mean value: 0.6105779113184302
MCC on Blind test: -0.05
Accuracy on Blind test: 0.66
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01438379 0.01190686 0.01167393 0.01092696 0.01165771 0.01160526
0.01157403 0.01168942 0.01145244 0.01180005]
mean value: 0.011867046356201172
key: score_time
value: [0.03903127 0.0135932 0.01455879 0.01390433 0.01458597 0.01428199
0.0195663 0.01395369 0.01408124 0.01455212]
mean value: 0.017210888862609863
key: test_mcc
value: [0.8121419 0.64666979 0.72760688 0.83405766 0.76725173 0.68442809
0.81032908 0.78744256 0.84516739 0.78744256]
mean value: 0.7702537643145709
key: train_mcc
value: [0.82778622 0.84765472 0.83520116 0.82286599 0.84532728 0.84532728
0.82060485 0.82560052 0.82628251 0.82314703]
mean value: 0.8319797554739864
key: test_accuracy
value: [0.8974359 0.79487179 0.84615385 0.91025641 0.87012987 0.81818182
0.8961039 0.88311688 0.92207792 0.88311688]
mean value: 0.8721445221445221
key: train_accuracy
value: [0.9066092 0.91810345 0.91091954 0.90373563 0.91678623 0.91678623
0.90243902 0.90530846 0.90674319 0.90387374]
mean value: 0.9091304688401853
key: test_fscore
value: [0.90697674 0.82978723 0.86666667 0.91764706 0.88372093 0.84444444
0.9047619 0.89655172 0.925 0.89655172]
mean value: 0.8872108431433565
key: train_fscore
value: [0.91458607 0.92430279 0.9182058 0.91218873 0.92328042 0.92328042
0.91122715 0.91338583 0.91413474 0.91218873]
mean value: 0.9166780691741931
key: test_precision
value: [0.82978723 0.70909091 0.76470588 0.84782609 0.79166667 0.73076923
0.82608696 0.8125 0.90243902 0.8125 ]
mean value: 0.8027371990790806
key: train_precision
value: [0.84261501 0.85925926 0.84878049 0.83855422 0.85749386 0.85749386
0.83693046 0.84057971 0.84596577 0.83855422]
mean value: 0.8466226843844898
key: test_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.94871795 1. ]
mean value: 0.9948717948717949
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99425287 1. ]
mean value: 0.9994252873563219
key: test_roc_auc
value: [0.8974359 0.79487179 0.84615385 0.91025641 0.87179487 0.82051282
0.8974359 0.88157895 0.9217274 0.88157895]
mean value: 0.8723346828609987
key: train_roc_auc
value: [0.9066092 0.91810345 0.91091954 0.90373563 0.91666667 0.91666667
0.90229885 0.90544413 0.90686856 0.90401146]
mean value: 0.9091324144517999
key: test_jcc
value: [0.82978723 0.70909091 0.76470588 0.84782609 0.79166667 0.73076923
0.82608696 0.8125 0.86046512 0.8125 ]
mean value: 0.7985398082679631
key: train_jcc
value: [0.84261501 0.85925926 0.84878049 0.83855422 0.85749386 0.85749386
0.83693046 0.84057971 0.84184915 0.83855422]
mean value: 0.8462110222092241
MCC on Blind test: -0.02
Accuracy on Blind test: 0.8
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03378057 0.03207874 0.03187418 0.03264308 0.03216743 0.03562951
0.03375864 0.03270745 0.037462 0.03120208]
mean value: 0.03333036899566651
key: score_time
value: [0.01566958 0.01609969 0.01588726 0.01613855 0.01561236 0.01528788
0.01540995 0.01555967 0.016819 0.0151825 ]
mean value: 0.01576664447784424
key: test_mcc
value: [0.85634884 0.72392277 0.74753193 0.87904907 0.83239263 0.8023596
0.85485041 0.87734648 0.8542977 0.8972297 ]
mean value: 0.8325329133780196
key: train_mcc
value: [0.89356008 0.89413323 0.89330128 0.89877261 0.89889638 0.88274599
0.88592188 0.8911284 0.89892579 0.87746695]
mean value: 0.8914852587934687
key: test_accuracy
value: [0.92307692 0.85897436 0.87179487 0.93589744 0.90909091 0.8961039
0.92207792 0.93506494 0.92207792 0.94805195]
mean value: 0.9122211122211122
key: train_accuracy
value: [0.94396552 0.9454023 0.9454023 0.94683908 0.94691535 0.93974175
0.93974175 0.94261119 0.94691535 0.93687231]
mean value: 0.9434406899849932
key: test_fscore
value: [0.92857143 0.86746988 0.87804878 0.93975904 0.91566265 0.90243902
0.92682927 0.93975904 0.92857143 0.95 ]
mean value: 0.9177110532723227
key: train_fscore
value: [0.94693878 0.9476584 0.94736842 0.94952251 0.94965986 0.94230769
0.94324324 0.94565217 0.94952251 0.93956044]
mean value: 0.9461434032200536
key: test_precision
value: [0.86666667 0.81818182 0.8372093 0.88636364 0.84444444 0.84090909
0.86363636 0.88636364 0.86666667 0.92682927]
mean value: 0.8637270893850588
key: train_precision
value: [0.89922481 0.91005291 0.9144385 0.9038961 0.90414508 0.90501319
0.89258312 0.89690722 0.9038961 0.9 ]
mean value: 0.9030157033752259
key: test_recall
value: [1. 0.92307692 0.92307692 1. 1. 0.97368421
1. 1. 1. 0.97435897]
mean value: 0.9794197031039137
key: train_recall
value: [1. 0.98850575 0.98275862 1. 1. 0.98280802
1. 1. 1. 0.98275862]
mean value: 0.9936831011428383
key: test_roc_auc
value: [0.92307692 0.85897436 0.87179487 0.93589744 0.91025641 0.89709852
0.92307692 0.93421053 0.92105263 0.9477058 ]
mean value: 0.9123144399460189
key: train_roc_auc
value: [0.94396552 0.9454023 0.9454023 0.94683908 0.94683908 0.93967987
0.93965517 0.94269341 0.9469914 0.93693805]
mean value: 0.943440618515957
key: test_jcc
value: [0.86666667 0.76595745 0.7826087 0.88636364 0.84444444 0.82222222
0.86363636 0.88636364 0.86666667 0.9047619 ]
mean value: 0.8489691683586226
key: train_jcc
value: [0.89922481 0.90052356 0.9 0.9038961 0.90414508 0.89090909
0.89258312 0.89690722 0.9038961 0.88601036]
mean value: 0.897809544222623
MCC on Blind test: 0.09
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.84709525 2.89001012 2.87484479 3.82822442 3.92568469 3.72552824
3.82616949 6.01553011 4.58798838 3.79543161]
mean value: 3.831650710105896
key: score_time
value: [0.01310539 0.01274061 0.01537585 0.02309084 0.0263958 0.0156002
0.01583242 0.01692939 0.01640368 0.0129261 ]
mean value: 0.016840028762817382
key: test_mcc
value: [0.87904907 0.90219371 0.87904907 0.97467943 0.92495119 0.87773765
0.94935876 0.97434188 0.94929201 0.90083601]
mean value: 0.9211488788014931
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93589744 0.94871795 0.93589744 0.98717949 0.96103896 0.93506494
0.97402597 0.98701299 0.97402597 0.94805195]
mean value: 0.9586913086913087
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93975904 0.95121951 0.93975904 0.98734177 0.96202532 0.9382716
0.97435897 0.98734177 0.975 0.95121951]
mean value: 0.9606296536736141
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88636364 0.90697674 0.88636364 0.975 0.92682927 0.88372093
0.95 0.975 0.95121951 0.90697674]
mean value: 0.9248450471819729
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93589744 0.94871795 0.93589744 0.98717949 0.96153846 0.93589744
0.97435897 0.98684211 0.97368421 0.94736842]
mean value: 0.9587381916329285
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88636364 0.90697674 0.88636364 0.975 0.92682927 0.88372093
0.95 0.975 0.95121951 0.90697674]
mean value: 0.9248450471819729
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.21
Accuracy on Blind test: 0.92
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04279375 0.02776575 0.02715778 0.02914882 0.02963543 0.02914047
0.02569294 0.02686524 0.03158975 0.02859664]
mean value: 0.02983865737915039
key: score_time
value: [0.01193166 0.00984049 0.00906992 0.00910592 0.00927043 0.0093317
0.00913215 0.00913596 0.00915956 0.00914335]
mean value: 0.009512114524841308
key: test_mcc
value: [0.90219371 0.94996791 0.94996791 0.94996791 0.97435897 0.92495119
0.90109146 0.90083601 0.94929201 0.97434188]
mean value: 0.9376968955076089
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94871795 0.97435897 0.97435897 0.97435897 0.98701299 0.96103896
0.94805195 0.94805195 0.97402597 0.98701299]
mean value: 0.9676989676989677
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95121951 0.975 0.975 0.975 0.98701299 0.96202532
0.95 0.95121951 0.975 0.98734177]
mean value: 0.9688819100010826
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90697674 0.95121951 0.95121951 0.95121951 0.97435897 0.92682927
0.9047619 0.90697674 0.95121951 0.975 ]
mean value: 0.9399781684566143
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94871795 0.97435897 0.97435897 0.97435897 0.98717949 0.96153846
0.94871795 0.94736842 0.97368421 0.98684211]
mean value: 0.9677125506072874
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90697674 0.95121951 0.95121951 0.95121951 0.97435897 0.92682927
0.9047619 0.90697674 0.95121951 0.975 ]
mean value: 0.9399781684566143
key: train_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.51
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13569999 0.1304841 0.14602518 0.13374615 0.13699889 0.12979698
0.14601016 0.13892627 0.13776207 0.1331687 ]
mean value: 0.13686184883117675
key: score_time
value: [0.02058697 0.01867318 0.01988053 0.02024674 0.02007627 0.02043152
0.01954746 0.02025414 0.01957679 0.01884198]
mean value: 0.019811558723449706
key: test_mcc
value: [0.97467943 0.97467943 0.97467943 1. 1. 0.97435897
1. 1. 1. 1. ]
mean value: 0.9898397277801664
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 0.98717949 0.98717949 1. 1. 0.98701299
1. 1. 1. 1. ]
mean value: 0.9948551448551448
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 0.98734177 0.98734177 1. 1. 0.98701299
1. 1. 1. 1. ]
mean value: 0.9949038303468682
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 0.975 0.975 1. 1. 0.97435897
1. 1. 1. 1. ]
mean value: 0.9899358974358974
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 0.98717949 0.98717949 1. 1. 0.98717949
1. 1. 1. 1. ]
mean value: 0.9948717948717949
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 0.975 0.975 1. 1. 0.97435897
1. 1. 1. 1. ]
mean value: 0.9899358974358974
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01173282 0.012146 0.01239681 0.01195526 0.01194382 0.01196694
0.01245904 0.01131797 0.01180458 0.01225233]
mean value: 0.011997556686401368
key: score_time
value: [0.00938344 0.00997138 0.00914097 0.00916886 0.00958729 0.00944328
0.00931644 0.00916767 0.00959706 0.00947905]
mean value: 0.009425544738769531
key: test_mcc
value: [0.85634884 0.83405766 0.90219371 0.94996791 0.92495119 0.83239263
0.92495119 0.94929201 0.90083601 0.92480439]
mean value: 0.8999795535671181
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92307692 0.91025641 0.94871795 0.97435897 0.96103896 0.90909091
0.96103896 0.97402597 0.94805195 0.96103896]
mean value: 0.947069597069597
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92857143 0.91764706 0.95121951 0.975 0.96202532 0.91566265
0.96202532 0.975 0.95121951 0.96296296]
mean value: 0.9501333758261967
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86666667 0.84782609 0.90697674 0.95121951 0.92682927 0.84444444
0.92682927 0.95121951 0.90697674 0.92857143]
mean value: 0.9057559675986764
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92307692 0.91025641 0.94871795 0.97435897 0.96153846 0.91025641
0.96153846 0.97368421 0.94736842 0.96052632]
mean value: 0.9471322537112011
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86666667 0.84782609 0.90697674 0.95121951 0.92682927 0.84444444
0.92682927 0.95121951 0.90697674 0.92857143]
mean value: 0.9057559675986764
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.9
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.88903451 1.84501958 1.8204875 1.84435105 1.91502929 1.95740795
1.92991972 1.87540865 1.91591406 1.82895994]
mean value: 1.8821532249450683
key: score_time
value: [0.09699202 0.09632206 0.09773874 0.09719157 0.10516429 0.10355401
0.10434246 0.09704232 0.09645724 0.09650421]
mean value: 0.09913089275360107
key: test_mcc
value: [0.97467943 0.97467943 0.97467943 1. 1. 0.94935876
0.97435897 1. 0.94929201 1. ]
mean value: 0.9797048049480017
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98717949 0.98717949 0.98717949 1. 1. 0.97402597
0.98701299 1. 0.97402597 1. ]
mean value: 0.9896603396603396
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98734177 0.98734177 0.98734177 1. 1. 0.97435897
0.98701299 1. 0.975 1. ]
mean value: 0.9898397277827657
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.975 0.975 0.975 1. 1. 0.95
0.97435897 1. 0.95121951 1. ]
mean value: 0.9800578486554096
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98717949 0.98717949 0.98717949 1. 1. 0.97435897
0.98717949 1. 0.97368421 1. ]
mean value: 0.9896761133603239
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.975 0.975 0.975 1. 1. 0.95
0.97435897 1. 0.95121951 1. ]
mean value: 0.9800578486554096
key: train_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.99514437 1.0225997 0.97147822 1.09300447 1.00245118 1.01744437
1.07910943 1.08882952 1.0241704 1.02668524]
mean value: 1.0320916891098022
key: score_time
value: [0.21192527 0.28798079 0.28110647 0.27433634 0.23993206 0.25457263
0.18695974 0.16408992 0.2514255 0.27029419]
mean value: 0.24226229190826415
key: test_mcc
value: [0.9258201 0.97467943 0.97467943 1. 0.97435897 0.92495119
0.97435897 0.94929201 0.90083601 1. ]
mean value: 0.9598976129087683
key: train_mcc
value: [0.98290472 0.98008335 0.98573393 0.98008335 0.98575394 0.98292859
0.98575394 0.98011218 0.98575452 0.9773027 ]
mean value: 0.9826411228856712
key: test_accuracy
value: [0.96153846 0.98717949 0.98717949 1. 0.98701299 0.96103896
0.98701299 0.97402597 0.94805195 1. ]
mean value: 0.9793040293040293
key: train_accuracy
value: [0.99137931 0.98994253 0.99281609 0.98994253 0.9928264 0.99139168
0.9928264 0.98995696 0.9928264 0.98852224]
mean value: 0.9912430531506127
key: test_fscore
value: [0.96296296 0.98734177 0.98734177 1. 0.98701299 0.96202532
0.98701299 0.975 0.95121951 1. ]
mean value: 0.9799917309943552
key: train_fscore
value: [0.99145299 0.99004267 0.99286733 0.99004267 0.99288762 0.99147727
0.99288762 0.99004267 0.99286733 0.98863636]
mean value: 0.9913204564273995
key: test_precision
value: [0.92857143 0.975 0.975 1. 0.97435897 0.92682927
0.97435897 0.95121951 0.90697674 1. ]
mean value: 0.9612314901963228
key: train_precision
value: [0.98305085 0.98028169 0.98583569 0.98028169 0.98587571 0.98309859
0.98587571 0.98028169 0.98583569 0.97752809]
mean value: 0.982794539984846
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.98717949 0.98717949 1. 0.98717949 0.96153846
0.98717949 0.97368421 0.94736842 1. ]
mean value: 0.9792847503373819
key: train_roc_auc
value: [0.99137931 0.98994253 0.99281609 0.98994253 0.99281609 0.99137931
0.99281609 0.98997135 0.99283668 0.98853868]
mean value: 0.9912438658894048
key: test_jcc
value: [0.92857143 0.975 0.975 1. 0.97435897 0.92682927
0.97435897 0.95121951 0.90697674 1. ]
mean value: 0.9612314901963228
key: train_jcc
value: [0.98305085 0.98028169 0.98583569 0.98028169 0.98587571 0.98309859
0.98587571 0.98028169 0.98583569 0.97752809]
mean value: 0.982794539984846
MCC on Blind test: 0.46
Accuracy on Blind test: 0.93
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0287087 0.01250434 0.01220155 0.01223969 0.01255131 0.01242614
0.01146221 0.01273441 0.01292753 0.01179338]
mean value: 0.013954925537109374
key: score_time
value: [0.01117444 0.01006937 0.01023412 0.00928569 0.00978541 0.01010013
0.00978398 0.0102222 0.01030898 0.00993943]
mean value: 0.010090374946594238
key: test_mcc
value: [0.36490022 0.30779351 0.26064302 0.33377251 0.66463964 0.43492389
0.48781832 0.6148924 0.45442279 0.48977837]
mean value: 0.44135846672835566
key: train_mcc
value: [0.48087064 0.50789897 0.47352983 0.47930458 0.47608902 0.48663171
0.52620857 0.42823335 0.52061948 0.52122959]
mean value: 0.49006157361636393
key: test_accuracy
value: [0.67948718 0.65384615 0.62820513 0.66666667 0.83116883 0.71428571
0.71428571 0.80519481 0.72727273 0.74025974]
mean value: 0.7160672660672661
key: train_accuracy
value: [0.73850575 0.75287356 0.73563218 0.73850575 0.73601148 0.74175036
0.758967 0.71305595 0.75609756 0.75753228]
mean value: 0.7428931875525652
key: test_fscore
value: [0.70588235 0.64935065 0.65882353 0.65789474 0.83544304 0.73170732
0.76595745 0.81927711 0.73417722 0.76744186]
mean value: 0.7325955254490786
key: train_fscore
value: [0.75405405 0.76373626 0.74794521 0.75068493 0.75268817 0.75609756
0.77894737 0.7260274 0.77572559 0.77436582]
mean value: 0.7580272368238905
key: test_precision
value: [0.65217391 0.65789474 0.60869565 0.67567568 0.80487805 0.68181818
0.64285714 0.77272727 0.725 0.70212766]
mean value: 0.6923848283492726
key: train_precision
value: [0.71173469 0.73157895 0.71465969 0.71727749 0.70886076 0.71722365
0.72019465 0.69371728 0.71707317 0.72319202]
mean value: 0.7155512339270805
key: test_recall
value: [0.76923077 0.64102564 0.71794872 0.64102564 0.86842105 0.78947368
0.94736842 0.87179487 0.74358974 0.84615385]
mean value: 0.7836032388663968
key: train_recall
value: [0.80172414 0.79885057 0.78448276 0.78735632 0.80229226 0.79942693
0.84813754 0.76149425 0.84482759 0.83333333]
mean value: 0.8061925699041597
key: test_roc_auc
value: [0.67948718 0.65384615 0.62820513 0.66666667 0.83164642 0.71524966
0.71727395 0.80431849 0.72705803 0.7388664 ]
mean value: 0.7162618083670715
key: train_roc_auc
value: [0.73850575 0.75287356 0.73563218 0.73850575 0.73591625 0.74166749
0.75883888 0.71312535 0.75622468 0.75764088]
mean value: 0.7428930771004183
key: test_jcc
value: [0.54545455 0.48076923 0.49122807 0.49019608 0.7173913 0.57692308
0.62068966 0.69387755 0.58 0.62264151]
mean value: 0.5819171021728274
key: train_jcc
value: [0.60520607 0.61777778 0.59737418 0.60087719 0.60344828 0.60784314
0.63793103 0.56989247 0.63362069 0.63180828]
mean value: 0.6105779113184302
MCC on Blind test: -0.05
Accuracy on Blind test: 0.66
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.12332344 0.09664536 0.1114316 0.11105561 0.11129999 0.11255002
0.11294699 0.10015678 0.10224271 0.10989952]
mean value: 0.10915520191192626
key: score_time
value: [0.0127213 0.01172376 0.01155186 0.01144075 0.01183057 0.01129174
0.01135087 0.01148176 0.01120925 0.01261234]
mean value: 0.011721420288085937
key: test_mcc
value: [0.94996791 0.97467943 0.97467943 0.9258201 0.90109146 0.90109146
0.90109146 0.92480439 0.92480439 0.97434188]
mean value: 0.9352371904713324
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97435897 0.98717949 0.98717949 0.96153846 0.94805195 0.94805195
0.94805195 0.96103896 0.96103896 0.98701299]
mean value: 0.9663503163503163
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.975 0.98734177 0.98734177 0.96296296 0.95 0.95
0.95 0.96296296 0.96296296 0.98734177]
mean value: 0.9675914205344586
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95121951 0.975 0.975 0.92857143 0.9047619 0.9047619
0.9047619 0.92857143 0.92857143 0.975 ]
mean value: 0.9376219512195122
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97435897 0.98717949 0.98717949 0.96153846 0.94871795 0.94871795
0.94871795 0.96052632 0.96052632 0.98684211]
mean value: 0.9664304993252362
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95121951 0.975 0.975 0.92857143 0.9047619 0.9047619
0.9047619 0.92857143 0.92857143 0.975 ]
mean value: 0.9376219512195122
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.92
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0764122 0.08891892 0.06918049 0.09518313 0.07914066 0.0705924
0.0638895 0.08288693 0.07475996 0.07497978]
mean value: 0.07759439945220947
key: score_time
value: [0.03024125 0.02319503 0.0127213 0.01924682 0.0221169 0.01234627
0.02289748 0.0191679 0.01239538 0.01237106]
mean value: 0.018669939041137694
key: test_mcc
value: [0.85634884 0.82078268 0.73786479 0.74833148 0.76725173 0.76725173
0.74617462 0.87734648 0.80937951 0.8542977 ]
mean value: 0.7985029540379236
key: train_mcc
value: [0.89096264 0.87565306 0.87993502 0.90138782 0.89889638 0.89889638
0.88334423 0.90153681 0.88854099 0.89632075]
mean value: 0.8915474095702867
key: test_accuracy
value: [0.92307692 0.91025641 0.85897436 0.85897436 0.87012987 0.87012987
0.85714286 0.93506494 0.8961039 0.92207792]
mean value: 0.8901931401931402
key: train_accuracy
value: [0.94252874 0.93534483 0.93821839 0.94827586 0.94691535 0.94691535
0.93830703 0.94835007 0.94117647 0.94548063]
mean value: 0.9431512722835139
key: test_fscore
value: [0.92857143 0.91139241 0.87356322 0.87640449 0.88372093 0.88372093
0.87356322 0.93975904 0.90697674 0.92857143]
mean value: 0.9006243834165522
key: train_fscore
value: [0.94565217 0.93860846 0.94085282 0.95081967 0.94965986 0.94965986
0.94197031 0.95081967 0.94436906 0.94822888]
mean value: 0.9460640781261299
key: test_precision
value: [0.86666667 0.9 0.79166667 0.78 0.79166667 0.79166667
0.7755102 0.88636364 0.82978723 0.86666667]
mean value: 0.8279994407821155
key: train_precision
value: [0.89690722 0.89350649 0.90237467 0.90625 0.90414508 0.90414508
0.89030612 0.90625 0.89460154 0.9015544 ]
mean value: 0.900004060463696
key: test_recall
value: [1. 0.92307692 0.97435897 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9897435897435898
key: train_recall
value: [1. 0.98850575 0.98275862 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9971264367816092
key: test_roc_auc
value: [0.92307692 0.91025641 0.85897436 0.85897436 0.87179487 0.87179487
0.85897436 0.93421053 0.89473684 0.92105263]
mean value: 0.8903846153846154
key: train_roc_auc
value: [0.94252874 0.93534483 0.93821839 0.94827586 0.94683908 0.94683908
0.93821839 0.94842407 0.94126074 0.94555874]
mean value: 0.9431507920824688
key: test_jcc
value: [0.86666667 0.8372093 0.7755102 0.78 0.79166667 0.79166667
0.7755102 0.88636364 0.82978723 0.86666667]
mean value: 0.8201047247561702
key: train_jcc
value: [0.89690722 0.88431877 0.88831169 0.90625 0.90414508 0.90414508
0.89030612 0.90625 0.89460154 0.9015544 ]
mean value: 0.8976789895324295
MCC on Blind test: 0.2
Accuracy on Blind test: 0.8
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01472425 0.01341558 0.01087713 0.01060128 0.01069045 0.01054621
0.0105772 0.01125479 0.01069784 0.01084042]
mean value: 0.011422514915466309
key: score_time
value: [0.01215672 0.00941801 0.00900674 0.00898647 0.00899959 0.00894737
0.00896239 0.00891042 0.00900507 0.00895166]
mean value: 0.009334444999694824
key: test_mcc
value: [0.20519567 0.15430335 0.26726124 0.07733089 0.22064777 0.11856467
0.30922448 0.37651822 0.35043897 0.11748821]
mean value: 0.21969734696944593
key: train_mcc
value: [0.24771624 0.30775729 0.29057681 0.27210394 0.23385702 0.22599289
0.26890881 0.25426116 0.32485292 0.21973292]
mean value: 0.2645760001521891
key: test_accuracy
value: [0.6025641 0.57692308 0.62820513 0.53846154 0.61038961 0.55844156
0.64935065 0.68831169 0.67532468 0.55844156]
mean value: 0.6086413586413586
key: train_accuracy
value: [0.62356322 0.65373563 0.64511494 0.63505747 0.6169297 0.61262554
0.63414634 0.62697274 0.66140603 0.6097561 ]
mean value: 0.6319307706261647
key: test_fscore
value: [0.60759494 0.56 0.6741573 0.56097561 0.60526316 0.575
0.68235294 0.69230769 0.6835443 0.55263158]
mean value: 0.6193827523959481
key: train_fscore
value: [0.63611111 0.66104079 0.65357644 0.65582656 0.61911555 0.62912088
0.6473029 0.63483146 0.67847411 0.61690141]
mean value: 0.6432301211054297
key: test_precision
value: [0.6 0.58333333 0.6 0.53488372 0.60526316 0.54761905
0.61702128 0.69230769 0.675 0.56756757]
mean value: 0.6022995796248355
key: train_precision
value: [0.6155914 0.64738292 0.63835616 0.62051282 0.61647727 0.60422164
0.62566845 0.62087912 0.64507772 0.60497238]
mean value: 0.6239139877442058
key: test_recall
value: [0.61538462 0.53846154 0.76923077 0.58974359 0.60526316 0.60526316
0.76315789 0.69230769 0.69230769 0.53846154]
mean value: 0.6409581646423752
key: train_recall
value: [0.65804598 0.67528736 0.66954023 0.6954023 0.6217765 0.65616046
0.67048711 0.64942529 0.71551724 0.62931034]
mean value: 0.6640952804400092
key: test_roc_auc
value: [0.6025641 0.57692308 0.62820513 0.53846154 0.61032389 0.55904184
0.65080972 0.68825911 0.67510121 0.55870445]
mean value: 0.6088394062078273
key: train_roc_auc
value: [0.62356322 0.65373563 0.64511494 0.63505747 0.61692273 0.61256299
0.63409413 0.62700491 0.66148355 0.60978411]
mean value: 0.6319323683430491
key: test_jcc
value: [0.43636364 0.38888889 0.50847458 0.38983051 0.43396226 0.40350877
0.51785714 0.52941176 0.51923077 0.38181818]
mean value: 0.45093465046910325
key: train_jcc
value: [0.46639511 0.49369748 0.48541667 0.48790323 0.44834711 0.45891784
0.47852761 0.46502058 0.51340206 0.44602851]
mean value: 0.47436561851779774
MCC on Blind test: 0.12
Accuracy on Blind test: 0.72
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02453542 0.02623272 0.02625823 0.02281928 0.02866602 0.02868891
0.02171922 0.03020597 0.02553296 0.02535391]
mean value: 0.02600126266479492
key: score_time
value: [0.01155138 0.01207638 0.01214218 0.01211596 0.01210213 0.01208043
0.01219916 0.01220393 0.0121119 0.01224017]
mean value: 0.012082362174987793
key: test_mcc
value: [0.76930926 0.7200823 0.71625345 0.60697698 0.64957894 0.16543404
0.8023596 0.87734648 0.84537494 0.76581079]
mean value: 0.6918526765387455
key: train_mcc
value: [0.83025321 0.79692069 0.84328699 0.57956235 0.70007606 0.19369001
0.83146284 0.84121955 0.88020064 0.72824732]
mean value: 0.7224919656447272
key: test_accuracy
value: [0.87179487 0.85897436 0.84615385 0.76923077 0.81818182 0.53246753
0.8961039 0.93506494 0.92207792 0.87012987]
mean value: 0.832017982017982
key: train_accuracy
value: [0.90804598 0.89511494 0.91810345 0.75143678 0.83644189 0.53802009
0.91535151 0.91822095 0.93974175 0.84648494]
mean value: 0.8466962268507067
key: test_fscore
value: [0.88636364 0.86419753 0.86363636 0.8125 0.79411765 0.1
0.90243902 0.93975904 0.92105263 0.88636364]
mean value: 0.7970429506400427
key: train_fscore
value: [0.91578947 0.90148448 0.92307692 0.8009206 0.81063123 0.14814815
0.91725105 0.92223738 0.94084507 0.86674969]
mean value: 0.8147134044576962
key: test_precision
value: [0.79591837 0.83333333 0.7755102 0.68421053 0.9 1.
0.84090909 0.88636364 0.94594595 0.79591837]
mean value: 0.8458109471643306
key: train_precision
value: [0.84466019 0.84987277 0.87022901 0.66794626 0.96442688 0.96551724
0.89835165 0.87792208 0.92265193 0.76483516]
mean value: 0.8626413176203152
key: test_recall
value: [1. 0.8974359 0.97435897 1. 0.71052632 0.05263158
0.97368421 1. 0.8974359 1. ]
mean value: 0.8506072874493927
key: train_recall
value: [1. 0.95977011 0.98275862 1. 0.6991404 0.08022923
0.93696275 0.97126437 0.95977011 1. ]
mean value: 0.85898955966143
key: test_roc_auc
value: [0.87179487 0.85897436 0.84615385 0.76923077 0.81680162 0.52631579
0.89709852 0.93421053 0.92240216 0.86842105]
mean value: 0.831140350877193
key: train_roc_auc
value: [0.90804598 0.89511494 0.91810345 0.75143678 0.83663917 0.53867783
0.91532046 0.91829694 0.93977044 0.84670487]
mean value: 0.8468110858610809
key: test_jcc
value: [0.79591837 0.76086957 0.76 0.68421053 0.65853659 0.05263158
0.82222222 0.88636364 0.85365854 0.79591837]
mean value: 0.7070329385711505
key: train_jcc
value: [0.84466019 0.82063882 0.85714286 0.66794626 0.68156425 0.08
0.84715026 0.8556962 0.88829787 0.76483516]
mean value: 0.7307931873738781
MCC on Blind test: 0.12
Accuracy on Blind test: 0.9
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02789497 0.0347898 0.03035545 0.03025532 0.02563238 0.05030847
0.02515292 0.02643228 0.03052449 0.02471113]
mean value: 0.030605721473693847
key: score_time
value: [0.01405096 0.01233768 0.0121851 0.01455688 0.01313639 0.01473331
0.01224232 0.01234841 0.01217604 0.01206303]
mean value: 0.012983012199401855
key: test_mcc
value: [0.84726867 0.72760688 0.66666667 0.53420548 0.7145749 0.82542666
0.83239263 0.77968356 0.8542977 0.72333935]
mean value: 0.7505462489841164
key: train_mcc
value: [0.84798478 0.81018235 0.74869094 0.7222497 0.8247806 0.91239742
0.85890651 0.73253588 0.86115455 0.71904992]
mean value: 0.8037932657740003
key: test_accuracy
value: [0.92307692 0.84615385 0.80769231 0.74358974 0.85714286 0.90909091
0.90909091 0.88311688 0.92207792 0.84415584]
mean value: 0.8645188145188145
key: train_accuracy
value: [0.92385057 0.89942529 0.8591954 0.84770115 0.91104735 0.95552367
0.92826399 0.85652798 0.92826399 0.84074605]
mean value: 0.8950545441052787
key: test_fscore
value: [0.925 0.81818182 0.83870968 0.67741935 0.85714286 0.91358025
0.91566265 0.87323944 0.92857143 0.86666667]
mean value: 0.8614174136956543
key: train_fscore
value: [0.92285298 0.89028213 0.87657431 0.82392027 0.90746269 0.9567643
0.93093923 0.83766234 0.93169399 0.86245353]
mean value: 0.8940605755830133
key: test_precision
value: [0.90243902 1. 0.72222222 0.91304348 0.84615385 0.86046512
0.84444444 0.96875 0.86666667 0.76470588]
mean value: 0.8688890680770304
key: train_precision
value: [0.93510324 0.97931034 0.78026906 0.97637795 0.9470405 0.93206522
0.89866667 0.96268657 0.88802083 0.75816993]
mean value: 0.9057710318355587
key: test_recall
value: [0.94871795 0.69230769 1. 0.53846154 0.86842105 0.97368421
1. 0.79487179 1. 1. ]
mean value: 0.8816464237516869
key: train_recall
value: [0.91091954 0.81609195 1. 0.71264368 0.87106017 0.98280802
0.96561605 0.74137931 0.97988506 1. ]
mean value: 0.8980403780917564
key: test_roc_auc
value: [0.92307692 0.84615385 0.80769231 0.74358974 0.85728745 0.90991903
0.91025641 0.884278 0.92105263 0.84210526]
mean value: 0.8645411605937922
key: train_roc_auc
value: [0.92385057 0.89942529 0.8591954 0.84770115 0.9111048 0.95548447
0.92821032 0.85636301 0.92833794 0.84097421]
mean value: 0.8950647169252051
key: test_jcc
value: [0.86046512 0.69230769 0.72222222 0.51219512 0.75 0.84090909
0.84444444 0.775 0.86666667 0.76470588]
mean value: 0.7628916237133347
key: train_jcc
value: [0.85675676 0.80225989 0.78026906 0.70056497 0.83060109 0.9171123
0.87080103 0.72067039 0.87212276 0.75816993]
mean value: 0.8109328187613243
MCC on Blind test: 0.15
Accuracy on Blind test: 0.91
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.22017574 0.20179915 0.21202517 0.19547796 0.19455671 0.19919658
0.19616938 0.20111418 0.19684935 0.20956016]
mean value: 0.20269243717193602
key: score_time
value: [0.01577497 0.01719022 0.01790261 0.01592374 0.01673222 0.01601768
0.01648021 0.01736426 0.01662302 0.0169723 ]
mean value: 0.016698122024536133
key: test_mcc
value: [0.90219371 0.94996791 0.94996791 0.87904907 0.97435897 0.94935876
0.94935876 0.90083601 0.92480439 0.94929201]
mean value: 0.9329187493949491
key: train_mcc
value: [0.99141615 0.98573393 0.98573393 0.98857105 0.98858715 0.99713465
0.99427743 0.9914285 0.98858752 0.99427753]
mean value: 0.9905747849620279
key: test_accuracy
value: [0.94871795 0.97435897 0.97435897 0.93589744 0.98701299 0.97402597
0.97402597 0.94805195 0.96103896 0.97402597]
mean value: 0.9651515151515151
key: train_accuracy
value: [0.99568966 0.99281609 0.99281609 0.99425287 0.99426112 0.99856528
0.99713056 0.99569584 0.99426112 0.99713056]
mean value: 0.9952619188970794
key: test_fscore
value: [0.95121951 0.975 0.975 0.93975904 0.98701299 0.97435897
0.97435897 0.95121951 0.96296296 0.975 ]
mean value: 0.9665891959228721
key: train_fscore
value: [0.99570815 0.99286733 0.99286733 0.99428571 0.99430199 0.99856938
0.99714286 0.99570815 0.99428571 0.99713467]
mean value: 0.9952871309116362
key: test_precision
value: [0.90697674 0.95121951 0.95121951 0.88636364 0.97435897 0.95
0.95 0.90697674 0.92857143 0.95121951]
mean value: 0.9356906064251498
key: train_precision
value: [0.99145299 0.98583569 0.98583569 0.98863636 0.98866856 0.99714286
0.99430199 0.99145299 0.98863636 0.99428571]
mean value: 0.9906249219252052
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94871795 0.97435897 0.97435897 0.93589744 0.98717949 0.97435897
0.97435897 0.94736842 0.96052632 0.97368421]
mean value: 0.965080971659919
key: train_roc_auc
value: [0.99568966 0.99281609 0.99281609 0.99425287 0.99425287 0.99856322
0.99712644 0.99570201 0.99426934 0.99713467]
mean value: 0.9952623258571287
key: test_jcc
value: [0.90697674 0.95121951 0.95121951 0.88636364 0.97435897 0.95
0.95 0.90697674 0.92857143 0.95121951]
mean value: 0.9356906064251498
key: train_jcc
value: [0.99145299 0.98583569 0.98583569 0.98863636 0.98866856 0.99714286
0.99430199 0.99145299 0.98863636 0.99428571]
mean value: 0.9906249219252052
MCC on Blind test: 0.56
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.08092546 0.10974646 0.10242176 0.08341765 0.11342335 0.1117959
0.08383393 0.10895348 0.06967211 0.10660434]
mean value: 0.09707944393157959
key: score_time
value: [0.03384542 0.04069924 0.03315902 0.02925777 0.03667188 0.02650285
0.03642082 0.02489972 0.02062917 0.04203343]
mean value: 0.032411932945251465
key: test_mcc
value: [0.90219371 0.97467943 0.97467943 0.9258201 0.97435897 0.92495119
0.90109146 0.90083601 0.94929201 1. ]
mean value: 0.9427902319563773
key: train_mcc
value: [1. 0.99426929 1. 0.98573393 0.99713465 0.99713465
1. 0.99713467 1. 1. ]
mean value: 0.9971407191860393
key: test_accuracy
value: [0.94871795 0.98717949 0.98717949 0.96153846 0.98701299 0.96103896
0.94805195 0.94805195 0.97402597 1. ]
mean value: 0.9702797202797203
key: train_accuracy
value: [1. 0.99712644 1. 0.99281609 0.99856528 0.99856528
1. 0.99856528 1. 1. ]
mean value: 0.9985638368046966
key: test_fscore
value: [0.95121951 0.98734177 0.98734177 0.96296296 0.98701299 0.96202532
0.95 0.95121951 0.975 1. ]
mean value: 0.9714123835125688
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[1. 0.99713467 1. 0.99286733 0.99856938 0.99856938
1. 0.99856528 1. 1. ]
mean value: 0.998570605231082
key: test_precision
value: [0.90697674 0.975 0.975 0.92857143 0.97435897 0.92682927
0.9047619 0.90697674 0.95121951 1. ]
mean value: 0.9449694576552206
key: train_precision
value: [1. 0.99428571 1. 0.98583569 0.99714286 0.99714286
1. 0.99713467 1. 1. ]
mean value: 0.9971541793109526
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94871795 0.98717949 0.98717949 0.96153846 0.98717949 0.96153846
0.94871795 0.94736842 0.97368421 1. ]
mean value: 0.970310391363023
key: train_roc_auc
value: [1. 0.99712644 1. 0.99281609 0.99856322 0.99856322
1. 0.99856734 1. 1. ]
mean value: 0.9985636300760794
key: test_jcc
value: [0.90697674 0.975 0.975 0.92857143 0.97435897 0.92682927
0.9047619 0.90697674 0.95121951 1. ]
mean value: 0.9449694576552206
key: train_jcc
value: [1. 0.99428571 1. 0.98583569 0.99714286 0.99714286
1. 0.99713467 1. 1. ]
mean value: 0.9971541793109526
MCC on Blind test: 0.51
Accuracy on Blind test: 0.93
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.31867123 0.34451103 0.31839418 0.36686802 0.33077478 0.31834817
0.33788633 0.32998705 0.36612749 0.33039927]
mean value: 0.336196756362915
key: score_time
value: [0.03126788 0.03241563 0.03144383 0.03145695 0.03186917 0.01802111
0.03072286 0.03075838 0.03771853 0.0336225 ]
mean value: 0.03092968463897705
key: test_mcc
value: [0.9258201 0.79056942 0.83405766 0.90219371 0.85485041 0.78862619
0.92495119 0.90083601 0.92480439 0.92480439]
mean value: 0.877151345543939
key: train_mcc
value: [0.98008335 0.98290472 0.97726976 0.98008335 0.97449907 0.97730122
0.97730122 0.97450093 0.9773027 0.9773027 ]
mean value: 0.977854903371284
key: test_accuracy
value: [0.96153846 0.88461538 0.91025641 0.94871795 0.92207792 0.88311688
0.96103896 0.94805195 0.96103896 0.96103896]
mean value: 0.9341491841491841
key: train_accuracy
value: [0.98994253 0.99137931 0.98850575 0.98994253 0.98708752 0.98852224
0.98852224 0.98708752 0.98852224 0.98852224]
mean value: 0.9888034103464767
key: test_fscore
value: [0.96296296 0.89655172 0.91764706 0.95121951 0.92682927 0.89411765
0.96202532 0.95121951 0.96296296 0.96296296]
mean value: 0.9388498928047796
key: train_fscore
value: [0.99004267 0.99145299 0.98863636 0.99004267 0.98727016 0.98866856
0.98866856 0.98723404 0.98863636 0.98863636]
mean value: 0.9889288739490248
key: test_precision
value: [0.92857143 0.8125 0.84782609 0.90697674 0.86363636 0.80851064
0.92682927 0.90697674 0.92857143 0.92857143]
mean value: 0.8858970131269819
key: train_precision
value: [0.98028169 0.98305085 0.97752809 0.98028169 0.97486034 0.97759104
0.97759104 0.97478992 0.97752809 0.97752809]
mean value: 0.9781030821393287
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.88461538 0.91025641 0.94871795 0.92307692 0.88461538
0.96153846 0.94736842 0.96052632 0.96052632]
mean value: 0.9342780026990554
key: train_roc_auc
value: [0.98994253 0.99137931 0.98850575 0.98994253 0.98706897 0.98850575
0.98850575 0.98710602 0.98853868 0.98853868]
mean value: 0.988803395580147
key: test_jcc
value: [0.92857143 0.8125 0.84782609 0.90697674 0.86363636 0.80851064
0.92682927 0.90697674 0.92857143 0.92857143]
mean value: 0.8858970131269819
key: train_jcc
value: [0.98028169 0.98305085 0.97752809 0.98028169 0.97486034 0.97759104
0.97759104 0.97478992 0.97752809 0.97752809]
mean value: 0.9781030821393287
MCC on Blind test: 0.07
Accuracy on Blind test: 0.88
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.79870081 0.79113436 0.799546 0.77011347 0.7785759 0.78570151
0.79611468 0.79451585 0.79098868 0.79267001]
mean value: 0.7898061275482178
key: score_time
value: [0.00966573 0.00960588 0.01036739 0.00957131 0.00974607 0.01048708
0.01015496 0.01037955 0.01056004 0.00958776]
mean value: 0.010012578964233399
key: test_mcc
value: [0.9258201 0.94996791 0.97467943 0.90219371 0.92495119 0.92495119
0.90109146 0.92480439 0.8542977 0.94929201]
mean value: 0.9232049095277226
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 0.99713467 1. 1. ]
mean value: 0.9997134670487106
key: test_accuracy
value: [0.96153846 0.97435897 0.98717949 0.94871795 0.96103896 0.96103896
0.94805195 0.96103896 0.92207792 0.97402597]
mean value: 0.9599067599067599
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 0.99856528 1. 1. ]
mean value: 0.9998565279770445
key: test_fscore
value: [0.96296296 0.975 0.98734177 0.95121951 0.96202532 0.96202532
0.95 0.96296296 0.92857143 0.975 ]
mean value: 0.9617109271755768
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 0.99856528 1. 1. ]
mean value: 0.9998565279770445
key: test_precision
value: [0.92857143 0.95121951 0.975 0.90697674 0.92682927 0.92682927
0.9047619 0.92857143 0.86666667 0.95121951]
mean value: 0.9266645733733085
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.99713467 1. 1. ]
mean value: 0.9997134670487106
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96153846 0.97435897 0.98717949 0.94871795 0.96153846 0.96153846
0.94871795 0.96052632 0.92105263 0.97368421]
mean value: 0.959885290148448
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 0.99856734 1. 1. ]
mean value: 0.9998567335243553
key: test_jcc
value: [0.92857143 0.95121951 0.975 0.90697674 0.92682927 0.92682927
0.9047619 0.92857143 0.86666667 0.95121951]
mean value: 0.9266645733733085
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 0.99713467 1. 1. ]
mean value: 0.9997134670487106
MCC on Blind test: 0.68
Accuracy on Blind test: 0.95
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03706169 0.03513145 0.04601455 0.03898883 0.03874493 0.03866482
0.03895307 0.03902817 0.03822541 0.04266095]
mean value: 0.039347386360168456
key: score_time
value: [0.01304889 0.01296854 0.01305175 0.01460671 0.01450467 0.01472878
0.01455522 0.01462984 0.01954913 0.01462722]
mean value: 0.0146270751953125
key: test_mcc
value: [0.9258201 1. 1. 1. 1. 1.
1. 1. 1. 0.94935876]
mean value: 0.9875178857715012
key: train_mcc
value: [0.98290472 1. 1. 1. 1. 1.
1. 1. 1. 0.97170451]
mean value: 0.9954609230792566
key: test_accuracy
value: [0.96153846 1. 1. 1. 1. 1.
1. 1. 1. 0.97402597]
mean value: 0.9935564435564436
key: train_accuracy
value: [0.99137931 1. 1. 1. 1. 1.
1. 1. 1. 0.9856528 ]
mean value: 0.9977032108049275
key: test_fscore
value: [0.96 1. 1. 1. 1. 1.
1. 1. 1. 0.97368421]
mean value: 0.9933684210526316
key: train_fscore
value: [0.99130435 1. 1. 1. 1. 1.
1. 1. 1. 0.98542274]
mean value: 0.9976727088350869
key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 1. 1. 1. 1. 1.
1. 1. 1. 0.94871795]
mean value: 0.9871794871794872
key: train_recall
value: [0.98275862 1. 1. 1. 1. 1.
1. 1. 1. 0.97126437]
mean value: 0.9954022988505747
key: test_roc_auc
value: [0.96153846 1. 1. 1. 1. 1.
1. 1. 1. 0.97435897]
mean value: 0.9935897435897436
key: train_roc_auc
value: [0.99137931 1. 1. 1. 1. 1.
1. 1. 1. 0.98563218]
mean value: 0.9977011494252873
key: test_jcc
value: [0.92307692 1. 1. 1. 1. 1.
1. 1. 1. 0.94871795]
mean value: 0.9871794871794872
key: train_jcc
value: [0.98275862 1. 1. 1. 1. 1.
1. 1. 1. 0.97126437]
mean value: 0.9954022988505747
MCC on Blind test: 0.0
Accuracy on Blind test: 0.92
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.05267787 0.05644894 0.03530025 0.04143929 0.04184628 0.04226303
0.0421288 0.04190731 0.04176641 0.04356456]
mean value: 0.0439342737197876
key: score_time
value: [0.02663398 0.02642751 0.02349806 0.01925349 0.0191915 0.01909399
0.01909709 0.01896191 0.0190537 0.01912689]
mean value: 0.021033811569213866
key: test_mcc
value: [0.80444891 0.74753193 0.71625345 0.78192905 0.85485041 0.77311567
0.85485041 0.87734648 0.74104277 0.82485566]
mean value: 0.7976224743394481
key: train_mcc
value: [0.86796153 0.8626274 0.87063883 0.85996529 0.89109427 0.87389113
0.87896711 0.86378336 0.85282785 0.86692344]
mean value: 0.8688680215807241
key: test_accuracy
value: [0.8974359 0.87179487 0.84615385 0.88461538 0.92207792 0.88311688
0.92207792 0.93506494 0.87012987 0.90909091]
mean value: 0.8941558441558441
key: train_accuracy
value: [0.93247126 0.92816092 0.93390805 0.92816092 0.94261119 0.93543759
0.93687231 0.92969871 0.92539455 0.93113343]
mean value: 0.9323848925608932
key: test_fscore
value: [0.9047619 0.87804878 0.86363636 0.89411765 0.92682927 0.88888889
0.92682927 0.93975904 0.875 0.91566265]
mean value: 0.901353380816614
key: train_fscore
value: [0.93517241 0.93224932 0.93646409 0.93131868 0.94579946 0.93810179
0.94021739 0.93296854 0.92777778 0.93442623]
mean value: 0.9354495687009976
key: test_precision
value: [0.84444444 0.8372093 0.7755102 0.82608696 0.86363636 0.8372093
0.86363636 0.88636364 0.85365854 0.86363636]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_cd_8020.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.8451391473557073
key: train_precision
value: [0.89920424 0.88205128 0.90159574 0.89210526 0.89717224 0.9021164
0.89405685 0.89033943 0.89784946 0.890625 ]
mean value: 0.8947115908040395
key: test_recall
value: [0.97435897 0.92307692 0.97435897 0.97435897 1. 0.94736842
1. 1. 0.8974359 0.97435897]
mean value: 0.9665317139001349
key: train_recall
value: [0.97413793 0.98850575 0.97413793 0.97413793 1. 0.97707736
0.99140401 0.97988506 0.95977011 0.98275862]
mean value: 0.98018147086915
key: test_roc_auc
value: [0.8974359 0.87179487 0.84615385 0.88461538 0.92307692 0.88394062
0.92307692 0.93421053 0.86977058 0.90823212]
mean value: 0.8942307692307693
key: train_roc_auc
value: [0.93247126 0.92816092 0.93390805 0.92816092 0.94252874 0.93537776
0.93679396 0.92977061 0.9254438 0.93120739]
mean value: 0.9323823403484504
key: test_jcc
value: [0.82608696 0.7826087 0.76 0.80851064 0.86363636 0.8
0.86363636 0.88636364 0.77777778 0.84444444]
mean value: 0.8213064876330372
key: train_jcc
value: [0.87823834 0.87309645 0.88051948 0.8714653 0.89717224 0.88341969
0.88717949 0.87435897 0.86528497 0.87692308]
mean value: 0.878765800299655
MCC on Blind test: 0.18
Accuracy on Blind test: 0.84
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=167)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.31355023 0.32501268 0.31309772 0.39799476 0.40076375 0.35185313
0.4216733 0.37915516 0.40246606 0.49356413]
mean value: 0.3799130916595459
key: score_time
value: [0.01916027 0.01919985 0.01915812 0.0290122 0.0192492 0.02470922
0.01920152 0.01917338 0.02637959 0.01917243]
mean value: 0.02144157886505127
key: test_mcc
value: [0.8273403 0.82078268 0.71625345 0.79056942 0.81032908 0.74986878
0.83239263 0.8542977 0.83165353 0.80158863]
mean value: 0.8035076206761982
key: train_mcc
value: [0.874631 0.85745608 0.87728003 0.89616338 0.89109427 0.871224
0.87564489 0.86905827 0.86378336 0.85907626]
mean value: 0.873541152508048
key: test_accuracy
value: [0.91025641 0.91025641 0.84615385 0.88461538 0.8961039 0.87012987
0.90909091 0.92207792 0.90909091 0.8961039 ]
mean value: 0.8953879453879454
key: train_accuracy
value: [0.93534483 0.92528736 0.93678161 0.9454023 0.94261119 0.93400287
0.93400287 0.93256815 0.92969871 0.92682927]
mean value: 0.9342529147908112
key: test_fscore
value: [0.91566265 0.91139241 0.86363636 0.89655172 0.9047619 0.87804878
0.91566265 0.92857143 0.91764706 0.9047619 ]
mean value: 0.9036696871448977
key: train_fscore
value: [0.9382716 0.92972973 0.93956044 0.94822888 0.94579946 0.93681319
0.93817204 0.93552812 0.93296854 0.93061224]
mean value: 0.9375684246743723
key: test_precision
value: [0.86363636 0.9 0.7755102 0.8125 0.82608696 0.81818182
0.84444444 0.86666667 0.84782609 0.84444444]
mean value: 0.8399296984933631
key: train_precision
value: [0.8976378 0.87755102 0.9 0.9015544 0.89717224 0.89973615
0.8835443 0.89501312 0.89033943 0.88372093]
mean value: 0.8926269387067017
key: test_recall
value: [0.97435897 0.92307692 0.97435897 1. 1. 0.94736842
1. 1. 1. 0.97435897]
mean value: 0.9793522267206478
key: train_recall
value: [0.98275862 0.98850575 0.98275862 1. 1. 0.97707736
1. 0.97988506 0.97988506 0.98275862]
mean value: 0.9873629088034779
key: test_roc_auc
value: [0.91025641 0.91025641 0.84615385 0.88461538 0.8974359 0.87112011
0.91025641 0.92105263 0.90789474 0.89507422]
mean value: 0.8954116059379218
key: train_roc_auc
value: [0.93534483 0.92528736 0.93678161 0.9454023 0.94252874 0.93394098
0.93390805 0.93263594 0.92977061 0.9269094 ]
mean value: 0.9342509798109542
key: test_jcc
value: [0.84444444 0.8372093 0.76 0.8125 0.82608696 0.7826087
0.84444444 0.86666667 0.84782609 0.82608696]
mean value: 0.8247873553533311
key: train_jcc
value: [0.88372093 0.86868687 0.88601036 0.9015544 0.89717224 0.88113695
0.8835443 0.87886598 0.87435897 0.87022901]
mean value: 0.8825280018338528
MCC on Blind test: 0.22
Accuracy on Blind test: 0.82