LSHTM_analysis/scripts/ml/log_pnca_8020.txt

19103 lines
918 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 424
PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation
or_mychisq 102
log10_or_mychisq 102
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 166
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 173
-------------------------------------------------------------
Successfully split data with stratification: 80/20
Train data size: (148, 173)
Test data size: (37, 173)
y_train numbers: Counter({1: 91, 0: 57})
y_train ratio: 0.6263736263736264
y_test_numbers: Counter({1: 23, 0: 14})
y_test ratio: 0.6086956521739131
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 91, 1: 91})
(182, 173)
Simple Random UnderSampling
Counter({0: 57, 1: 57})
(114, 173)
Simple Combined Over and UnderSampling
Counter({0: 91, 1: 91})
(182, 173)
SMOTE_NC OverSampling
Counter({0: 91, 1: 91})
(182, 173)
#####################################################################
Running ML analysis: 80/20 split
Gene name: pncA
Drug name: pyrazinamide
Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_8020/
Sanity checks:
ML source data size: (185, 173)
Total input features: (148, 173)
Target feature numbers: Counter({1: 91, 0: 57})
Target features ratio: 0.6263736263736264
#####################################################################
================================================================
Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03000379 0.02730155 0.02838016 0.02717137 0.03072381 0.03129315
0.03094292 0.03128314 0.03002071 0.0396781 ]
mean value: 0.030679869651794433
key: score_time
value: [0.01214504 0.01164198 0.01170969 0.01179552 0.01176286 0.01186895
0.01166534 0.01296544 0.01171398 0.01177335]
mean value: 0.011904215812683106
key: test_mcc
value: [0.43082022 0.27216553 0.27216553 0.38888889 0.43082022 0.43082022
0.28867513 0. 0.70064905 0.54772256]
mean value: 0.376272733996905
key: train_mcc
value: [0.84034551 0.87406606 0.74333704 0.79198044 0.82449074 0.84138381
0.85700105 0.79479796 0.77889634 0.84234132]
mean value: 0.8188640257068358
key: test_accuracy
value: [0.73333333 0.66666667 0.66666667 0.66666667 0.73333333 0.73333333
0.66666667 0.53333333 0.85714286 0.78571429]
mean value: 0.7042857142857143
key: train_accuracy
value: [0.92481203 0.93984962 0.87969925 0.90225564 0.91729323 0.92481203
0.93233083 0.90225564 0.89552239 0.92537313]
mean value: 0.9144203793064751
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.8 0.76190476 0.76190476 0.66666667 0.8 0.8
0.73684211 0.63157895 0.9 0.85714286]
mean value: 0.7716040100250626
key: train_fscore
value: [0.94047619 0.95294118 0.90588235 0.92307692 0.93491124 0.94117647
0.94674556 0.92307692 0.91764706 0.94047619]
mean value: 0.9326410090663484
key: test_precision
value: [0.72727273 0.66666667 0.66666667 0.83333333 0.72727273 0.72727273
0.7 0.66666667 0.81818182 0.75 ]
mean value: 0.7283333333333333
key: train_precision
value: [0.91860465 0.92045455 0.875 0.89655172 0.90804598 0.90909091
0.91954023 0.88636364 0.88636364 0.91860465]
mean value: 0.9038619960632791
key: test_recall
value: [0.88888889 0.88888889 0.88888889 0.55555556 0.88888889 0.88888889
0.77777778 0.6 1. 1. ]
mean value: 0.8377777777777777
key: train_recall
value: [0.96341463 0.98780488 0.93902439 0.95121951 0.96341463 0.97560976
0.97560976 0.96296296 0.95121951 0.96341463]
mean value: 0.9633694670280035
key: test_roc_auc
value: [0.69444444 0.61111111 0.61111111 0.69444444 0.69444444 0.69444444
0.63888889 0.5 0.8 0.7 ]
mean value: 0.6638888888888889
key: train_roc_auc
value: [0.91307987 0.92527499 0.86166906 0.88737446 0.90327594 0.90937351
0.91917743 0.88532764 0.87945591 0.91439962]
mean value: 0.8998408421112869
key: test_jcc
value: [0.66666667 0.61538462 0.61538462 0.5 0.66666667 0.66666667
0.58333333 0.46153846 0.81818182 0.75 ]
mean value: 0.6343822843822844
key: train_jcc
value: [0.88764045 0.91011236 0.82795699 0.85714286 0.87777778 0.88888889
0.8988764 0.85714286 0.84782609 0.88764045]
mean value: 0.8741005120077563
MCC on Blind test: 0.54
Accuracy on Blind test: 0.78
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.6358583 0.63472962 0.77536225 0.86090446 0.62058043 0.62573814
0.72684979 1.00338984 0.77752471 1.01283073]
mean value: 0.7673768281936646
key: score_time
value: [0.01333785 0.01501322 0.01294947 0.01333427 0.01240039 0.01354885
0.01353598 0.01332974 0.01322675 0.01212215]
mean value: 0.013279867172241212
key: test_mcc
value: [0.28867513 0.28867513 0.16666667 0.49099025 0.73854895 0.6000992
0.44444444 0.28867513 0.86066297 0.54772256]
mean value: 0.4715160435280545
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.66666667 0.6 0.73333333 0.86666667 0.8
0.73333333 0.6 0.92857143 0.78571429]
mean value: 0.7380952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.73684211 0.66666667 0.75 0.9 0.82352941
0.77777778 0.625 0.94117647 0.85714286]
mean value: 0.7814977394466558
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 0.7 0.66666667 0.85714286 0.81818182 0.875
0.77777778 0.83333333 1. 0.75 ]
mean value: 0.7978102453102454
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.77777778 0.66666667 0.66666667 1. 0.77777778
0.77777778 0.5 0.88888889 1. ]
mean value: 0.7833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63888889 0.63888889 0.58333333 0.75 0.83333333 0.80555556
0.72222222 0.65 0.94444444 0.7 ]
mean value: 0.7266666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58333333 0.58333333 0.5 0.6 0.81818182 0.7
0.63636364 0.45454545 0.88888889 0.75 ]
mean value: 0.6514646464646464
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01237035 0.01113296 0.00974202 0.00966811 0.0096302 0.00947905
0.00960612 0.00956416 0.0093956 0.00943828]
mean value: 0.010002684593200684
key: score_time
value: [0.01165819 0.00971031 0.00957322 0.00939584 0.00926685 0.00922227
0.0092895 0.00912857 0.00913 0.00887775]
mean value: 0.009525251388549805
key: test_mcc
value: [ 0.43082022 0. 0.27216553 0.12309149 0.05455447 0.61237244
0.08006408 -0.18898224 0.33734954 0. ]
mean value: 0.1721435527505622
key: train_mcc
value: [0.54501213 0.42534 0.56649197 0.40644472 0.42721465 0.36528121
0.44773865 0.43164105 0.38378759 0.4226252 ]
mean value: 0.4421577185051267
key: test_accuracy
value: [0.73333333 0.6 0.66666667 0.6 0.53333333 0.8
0.6 0.6 0.71428571 0.64285714]
mean value: 0.6490476190476191
key: train_accuracy
value: [0.78947368 0.73684211 0.79699248 0.72932331 0.73684211 0.69924812
0.7443609 0.72180451 0.71641791 0.7238806 ]
mean value: 0.7395185725507799
key: test_fscore
value: [0.8 0.75 0.76190476 0.7 0.58823529 0.85714286
0.72727273 0.75 0.8 0.7826087 ]
mean value: 0.7517164336090167
key: train_fscore
value: [0.84090909 0.81081081 0.83832335 0.80434783 0.81283422 0.8019802
0.81914894 0.81218274 0.8 0.81218274]
mean value: 0.8152719922122719
key: test_precision
value: [0.72727273 0.6 0.66666667 0.63636364 0.625 0.75
0.61538462 0.64285714 0.72727273 0.64285714]
mean value: 0.6633674658674659
key: train_precision
value: [0.78723404 0.72815534 0.82352941 0.7254902 0.72380952 0.675
0.72641509 0.68965517 0.7037037 0.69565217]
mean value: 0.7278644658381841
key: test_recall
value: [0.88888889 1. 0.88888889 0.77777778 0.55555556 1.
0.88888889 0.9 0.88888889 1. ]
mean value: 0.8788888888888888
key: train_recall
value: [0.90243902 0.91463415 0.85365854 0.90243902 0.92682927 0.98780488
0.93902439 0.98765432 0.92682927 0.97560976]
mean value: 0.9316922613670581
key: test_roc_auc
value: [0.69444444 0.5 0.61111111 0.55555556 0.52777778 0.75
0.52777778 0.45 0.64444444 0.5 ]
mean value: 0.5761111111111111
key: train_roc_auc
value: [0.75514108 0.68280727 0.77977044 0.67670971 0.67910091 0.6115495
0.68519847 0.64767331 0.65572233 0.65126642]
mean value: 0.6824939436548714
key: test_jcc
value: [0.66666667 0.6 0.61538462 0.53846154 0.41666667 0.75
0.57142857 0.6 0.66666667 0.64285714]
mean value: 0.6068131868131869
key: train_jcc
value: [0.7254902 0.68181818 0.72164948 0.67272727 0.68468468 0.66942149
0.69369369 0.68376068 0.66666667 0.68376068]
mean value: 0.6883673035329687
MCC on Blind test: 0.26
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00984406 0.00968909 0.00974703 0.00958514 0.00977182 0.00971794
0.0096283 0.00964093 0.0096209 0.00963807]
mean value: 0.009688329696655274
key: score_time
value: [0.00931573 0.00920796 0.00928688 0.00928712 0.0091939 0.00921798
0.00923729 0.00919628 0.00921893 0.00846362]
mean value: 0.009162569046020507
key: test_mcc
value: [ 0.44444444 0.32732684 0. 0. 0. 0.57735027
-0.28867513 0. 0.06666667 0.33734954]
mean value: 0.14644626235299057
key: train_mcc
value: [0.43769978 0.45554586 0.45215696 0.49115256 0.4455592 0.44919673
0.49718111 0.42789983 0.44816116 0.45628689]
mean value: 0.4560840076371738
key: test_accuracy
value: [0.73333333 0.66666667 0.53333333 0.46666667 0.53333333 0.8
0.4 0.53333333 0.57142857 0.71428571]
mean value: 0.5952380952380952
key: train_accuracy
value: [0.73684211 0.7443609 0.7443609 0.7593985 0.73684211 0.7443609
0.76691729 0.72932331 0.73880597 0.73880597]
mean value: 0.7440017955336101
key: test_fscore
value: [0.77777778 0.70588235 0.63157895 0.42857143 0.63157895 0.84210526
0.52631579 0.63157895 0.66666667 0.8 ]
mean value: 0.6642056120693891
key: train_fscore
value: [0.79041916 0.79518072 0.79761905 0.80487805 0.78527607 0.8
0.81871345 0.7804878 0.78787879 0.7826087 ]
mean value: 0.7943061793288788
key: test_precision
value: [0.77777778 0.75 0.6 0.6 0.6 0.8
0.5 0.66666667 0.66666667 0.72727273]
mean value: 0.6688383838383838
key: train_precision
value: [0.77647059 0.78571429 0.77906977 0.80487805 0.79012346 0.77272727
0.78651685 0.77108434 0.78313253 0.79746835]
mean value: 0.7847185495522168
key: test_recall
value: [0.77777778 0.66666667 0.66666667 0.33333333 0.66666667 0.88888889
0.55555556 0.6 0.66666667 0.88888889]
mean value: 0.6711111111111111
key: train_recall
value: [0.80487805 0.80487805 0.81707317 0.80487805 0.7804878 0.82926829
0.85365854 0.79012346 0.79268293 0.76829268]
mean value: 0.8046221017765733
key: test_roc_auc
value: [0.72222222 0.66666667 0.5 0.5 0.5 0.77777778
0.36111111 0.5 0.53333333 0.64444444]
mean value: 0.5705555555555556
key: train_roc_auc
value: [0.71616451 0.72596844 0.72226208 0.74557628 0.72357724 0.71855571
0.74055476 0.71236942 0.72326454 0.73030019]
mean value: 0.7258593163483168
key: test_jcc
value: [0.63636364 0.54545455 0.46153846 0.27272727 0.46153846 0.72727273
0.35714286 0.46153846 0.5 0.66666667]
mean value: 0.5090243090243091
key: train_jcc
value: [0.65346535 0.66 0.66336634 0.67346939 0.64646465 0.66666667
0.69306931 0.64 0.65 0.64285714]
mean value: 0.6589358833842568
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00855732 0.02237892 0.00854349 0.00832772 0.00829554 0.00820947
0.0084157 0.00815558 0.0094521 0.00837159]
mean value: 0.00987074375152588
key: score_time
value: [0.04929829 0.02021766 0.01471043 0.00962043 0.0139358 0.0139904
0.00954151 0.01637411 0.01522803 0.01381803]
mean value: 0.017673468589782713
key: test_mcc
value: [-0.06804138 -0.28867513 -0.66666667 -0.06804138 0. 0.43082022
-0.21821789 -0.5 0.3721042 0.3721042 ]
mean value: -0.06346138290225109
key: train_mcc
value: [0.17136979 0.36243575 0.31069717 0.35315618 0.28850942 0.2741202
0.28523142 0.38382318 0.30552803 0.2083236 ]
mean value: 0.29431947475155223
key: test_accuracy
value: [0.53333333 0.4 0.2 0.53333333 0.53333333 0.73333333
0.4 0.33333333 0.71428571 0.71428571]
mean value: 0.5095238095238095
key: train_accuracy
value: [0.62406015 0.70676692 0.68421053 0.70676692 0.67669173 0.67669173
0.67669173 0.71428571 0.67910448 0.64179104]
mean value: 0.6787060935921894
key: test_fscore
value: [0.66666667 0.52631579 0.33333333 0.66666667 0.63157895 0.8
0.47058824 0.5 0.81818182 0.81818182]
mean value: 0.6231513275166526
key: train_fscore
value: [0.71590909 0.77456647 0.75862069 0.78453039 0.75706215 0.77248677
0.75977654 0.77906977 0.75144509 0.73333333]
mean value: 0.7586800284465707
key: test_precision
value: [0.58333333 0.5 0.33333333 0.58333333 0.6 0.72727273
0.5 0.5 0.69230769 0.69230769]
mean value: 0.5711888111888112
key: train_precision
value: [0.67021277 0.73626374 0.7173913 0.71717172 0.70526316 0.68224299
0.70103093 0.73626374 0.71428571 0.67346939]
mean value: 0.7053595438429273
key: test_recall
value: [0.77777778 0.55555556 0.33333333 0.77777778 0.66666667 0.88888889
0.44444444 0.5 1. 1. ]
mean value: 0.6944444444444444
key: train_recall
value: [0.76829268 0.81707317 0.80487805 0.86585366 0.81707317 0.8902439
0.82926829 0.82716049 0.79268293 0.80487805]
mean value: 0.8217404396266185
key: test_roc_auc
value: [0.47222222 0.36111111 0.16666667 0.47222222 0.5 0.69444444
0.38888889 0.25 0.6 0.6 ]
mean value: 0.45055555555555554
key: train_roc_auc
value: [0.58022477 0.67324247 0.64753706 0.65841703 0.63402678 0.61178862
0.63032042 0.68281102 0.64634146 0.59474672]
mean value: 0.6359456345946064
key: test_jcc
value: [0.5 0.35714286 0.2 0.5 0.46153846 0.66666667
0.30769231 0.33333333 0.69230769 0.69230769]
mean value: 0.47109890109890107
key: train_jcc
value: [0.55752212 0.63207547 0.61111111 0.64545455 0.60909091 0.62931034
0.61261261 0.63809524 0.60185185 0.57894737]
mean value: 0.6116071577056825
MCC on Blind test: -0.02
Accuracy on Blind test: 0.54
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01080036 0.01022005 0.01002049 0.01102018 0.01053524 0.00990343
0.01091599 0.00984478 0.01048589 0.01073265]
mean value: 0.01044790744781494
key: score_time
value: [0.00896406 0.00891638 0.00948572 0.00896621 0.00873971 0.00927854
0.00958657 0.00880337 0.00961518 0.00933743]
mean value: 0.009169316291809082
key: test_mcc
value: [ 0.48038446 -0.21821789 0.27216553 -0.18463724 -0.06804138 0.61237244
-0.32025631 0.13867505 0.3721042 0. ]
mean value: 0.10845488608517544
key: train_mcc
value: [0.61007042 0.6473291 0.58606018 0.63995699 0.57981496 0.51766191
0.63995699 0.5556364 0.58656282 0.57162035]
mean value: 0.5934670121246945
key: test_accuracy
value: [0.73333333 0.53333333 0.66666667 0.46666667 0.53333333 0.8
0.46666667 0.66666667 0.71428571 0.64285714]
mean value: 0.6223809523809524
key: train_accuracy
value: [0.80451128 0.82706767 0.79699248 0.81954887 0.78947368 0.7593985
0.81954887 0.77443609 0.79104478 0.78358209]
mean value: 0.7965604309280664
key: test_fscore
value: [0.81818182 0.69565217 0.76190476 0.6 0.66666667 0.85714286
0.63636364 0.7826087 0.81818182 0.7826087 ]
mean value: 0.7419311123658949
key: train_fscore
value: [0.86315789 0.87567568 0.85714286 0.87234043 0.85416667 0.83673469
0.87234043 0.84375 0.85416667 0.84974093]
mean value: 0.8579216238472576
key: test_precision
value: [0.69230769 0.57142857 0.66666667 0.54545455 0.58333333 0.75
0.53846154 0.69230769 0.69230769 0.64285714]
mean value: 0.6375124875124875
key: train_precision
value: [0.75925926 0.78640777 0.75700935 0.77358491 0.74545455 0.71929825
0.77358491 0.72972973 0.74545455 0.73873874]
mean value: 0.7528521988356293
key: test_recall
value: [1. 0.88888889 0.88888889 0.66666667 0.77777778 1.
0.77777778 0.9 1. 1. ]
mean value: 0.89
key: train_recall
value: [1. 0.98780488 0.98780488 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_roc_auc
value: [0.66666667 0.44444444 0.61111111 0.41666667 0.47222222 0.75
0.38888889 0.55 0.6 0.5 ]
mean value: 0.54
key: train_roc_auc
value: [0.74509804 0.77821616 0.73900048 0.76470588 0.7254902 0.68627451
0.76470588 0.71153846 0.73076923 0.72115385]
mean value: 0.7366952691020123
key: test_jcc
value: [0.69230769 0.53333333 0.61538462 0.42857143 0.5 0.75
0.46666667 0.64285714 0.69230769 0.64285714]
mean value: 0.5964285714285714
key: train_jcc
value: [0.75925926 0.77884615 0.75 0.77358491 0.74545455 0.71929825
0.77358491 0.72972973 0.74545455 0.73873874]
mean value: 0.7513951029417762
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.67545986 0.81942654 0.68312645 0.5726397 0.66584706 0.72752714
0.56442165 1.12021947 0.92654943 0.62951541]
mean value: 0.7384732723236084
key: score_time
value: [0.01198483 0.01239991 0.01204967 0.01192904 0.01209116 0.01514149
0.01191807 0.01218581 0.01198363 0.01201415]
mean value: 0.012369775772094726
key: test_mcc
value: [ 0.44444444 -0.32025631 0. 0.38888889 0.57735027 0.32732684
-0.06804138 0.09449112 0.55943093 0.54772256]
mean value: 0.2551357352065785
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73333333 0.46666667 0.53333333 0.66666667 0.8 0.66666667
0.53333333 0.53333333 0.78571429 0.78571429]
mean value: 0.6504761904761904
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.77777778 0.63636364 0.63157895 0.66666667 0.84210526 0.70588235
0.66666667 0.58823529 0.82352941 0.85714286]
mean value: 0.719594887396745
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.53846154 0.6 0.83333333 0.8 0.75
0.58333333 0.71428571 0.875 0.75 ]
mean value: 0.7222191697191698
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.77777778 0.66666667 0.55555556 0.88888889 0.66666667
0.77777778 0.5 0.77777778 1. ]
mean value: 0.7388888888888889
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.72222222 0.38888889 0.5 0.69444444 0.77777778 0.66666667
0.47222222 0.55 0.78888889 0.7 ]
mean value: 0.6261111111111112
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.63636364 0.46666667 0.46153846 0.5 0.72727273 0.54545455
0.5 0.41666667 0.7 0.75 ]
mean value: 0.5703962703962704
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01971579 0.01306844 0.01349854 0.0138483 0.01513457 0.01593947
0.01379418 0.01406312 0.01403928 0.01425767]
mean value: 0.014735937118530273
key: score_time
value: [0.01813078 0.00960183 0.00989127 0.01094699 0.01026487 0.00997877
0.01040626 0.01031375 0.01008248 0.01022696]
mean value: 0.010984396934509278
key: test_mcc
value: [0.73854895 0.8660254 0.87287156 0.8660254 0.72222222 0.57735027
0.6000992 0.85280287 0.86066297 0.84852814]
mean value: 0.7805136972619839
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.93333333 0.93333333 0.93333333 0.86666667 0.8
0.8 0.93333333 0.92857143 0.92857143]
mean value: 0.8923809523809524
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.94736842 0.94117647 0.94736842 0.88888889 0.84210526
0.82352941 0.95238095 0.94117647 0.94736842]
mean value: 0.9131362720526808
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.9 1. 0.9 0.88888889 0.8
0.875 0.90909091 1. 0.9 ]
mean value: 0.8991161616161616
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.88888889 1. 0.88888889 0.88888889
0.77777778 1. 0.88888889 1. ]
mean value: 0.9333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.91666667 0.94444444 0.91666667 0.86111111 0.77777778
0.80555556 0.9 0.94444444 0.9 ]
mean value: 0.88
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.9 0.88888889 0.9 0.8 0.72727273
0.7 0.90909091 0.88888889 0.9 ]
mean value: 0.8432323232323232
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11488128 0.09547067 0.09398174 0.09794021 0.09405684 0.09143519
0.08864164 0.08879304 0.089674 0.09006691]
mean value: 0.0944941520690918
key: score_time
value: [0.01881909 0.01698303 0.01879811 0.01835728 0.01962328 0.01830673
0.01720667 0.01846004 0.0170567 0.01702929]
mean value: 0.018064022064208984
key: test_mcc
value: [0.73854895 0.12309149 0.12309149 0.49099025 0.16666667 0.57735027
0.28867513 0.21320072 0.68888889 0.3721042 ]
mean value: 0.3782608060328875
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.6 0.6 0.73333333 0.6 0.8
0.66666667 0.66666667 0.85714286 0.71428571]
mean value: 0.7104761904761905
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.7 0.7 0.75 0.66666667 0.84210526
0.73684211 0.76190476 0.88888889 0.81818182]
mean value: 0.7764589504063188
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.63636364 0.63636364 0.85714286 0.66666667 0.8
0.7 0.72727273 0.88888889 0.69230769]
mean value: 0.7423187923187923
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.77777778 0.77777778 0.66666667 0.66666667 0.88888889
0.77777778 0.8 0.88888889 1. ]
mean value: 0.8244444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.55555556 0.55555556 0.75 0.58333333 0.77777778
0.63888889 0.6 0.84444444 0.6 ]
mean value: 0.6738888888888889
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.53846154 0.53846154 0.6 0.5 0.72727273
0.58333333 0.61538462 0.8 0.69230769]
mean value: 0.6413403263403263
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.35
Accuracy on Blind test: 0.7
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00981903 0.00927949 0.00883126 0.00921917 0.00866628 0.00878716
0.00877333 0.00881672 0.00892115 0.00923777]
mean value: 0.009035134315490722
key: score_time
value: [0.0092175 0.00844312 0.00844026 0.00871563 0.00834346 0.00851798
0.0086832 0.00851035 0.00845814 0.00913239]
mean value: 0.008646202087402344
key: test_mcc
value: [ 0.57735027 0.16666667 -0.21821789 0.16666667 0.16666667 0.
0. 0.1 0.25819889 0.3721042 ]
mean value: 0.15894354724684198
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8 0.6 0.4 0.6 0.6 0.53333333
0.53333333 0.6 0.64285714 0.71428571]
mean value: 0.6023809523809524
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.66666667 0.47058824 0.66666667 0.66666667 0.63157895
0.63157895 0.7 0.70588235 0.81818182]
mean value: 0.6799915564311849
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.66666667 0.5 0.66666667 0.66666667 0.6
0.6 0.7 0.75 0.69230769]
mean value: 0.6642307692307692
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.66666667 0.44444444 0.66666667 0.66666667 0.66666667
0.66666667 0.7 0.66666667 1. ]
mean value: 0.7033333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77777778 0.58333333 0.38888889 0.58333333 0.58333333 0.5
0.5 0.55 0.63333333 0.6 ]
mean value: 0.57
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 0.5 0.30769231 0.5 0.5 0.46153846
0.46153846 0.53846154 0.54545455 0.69230769]
mean value: 0.5234265734265734
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.48
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.14825535 1.1420064 1.24736357 1.28721571 1.46088004 1.41407871
1.45253873 1.21245909 1.15079379 1.13232279]
mean value: 1.2647914171218873
key: score_time
value: [0.08690786 0.08760619 0.0870502 0.1696291 0.11017585 0.11227798
0.11885667 0.09209251 0.09239531 0.08728147]
mean value: 0.10442731380462647
key: test_mcc
value: [0.73854895 0.44444444 0.12309149 0.49099025 0.57735027 0.57735027
0.44444444 0.53300179 0.86066297 0.54772256]
mean value: 0.5337607431372515
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.73333333 0.6 0.73333333 0.8 0.8
0.73333333 0.8 0.92857143 0.78571429]
mean value: 0.7780952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.77777778 0.7 0.75 0.84210526 0.84210526
0.77777778 0.85714286 0.94117647 0.85714286]
mean value: 0.8245228266745295
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.77777778 0.63636364 0.85714286 0.8 0.8
0.77777778 0.81818182 1. 0.75 ]
mean value: 0.8035425685425686
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.77777778 0.77777778 0.66666667 0.88888889 0.88888889
0.77777778 0.9 0.88888889 1. ]
mean value: 0.8566666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.72222222 0.55555556 0.75 0.77777778 0.77777778
0.72222222 0.75 0.94444444 0.7 ]
mean value: 0.7533333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.63636364 0.53846154 0.6 0.72727273 0.72727273
0.63636364 0.75 0.88888889 0.75 ]
mean value: 0.7072804972804972
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.78
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.70673203 0.90127635 0.91046214 0.90426564 0.95201159 0.90982032
0.84366179 0.90904737 0.85923839 0.94528151]
mean value: 0.9841797113418579
key: score_time
value: [0.19794631 0.20943165 0.21002579 0.20278788 0.13683844 0.20249867
0.14593673 0.13761425 0.12478852 0.16785192]
mean value: 0.1735720157623291
key: test_mcc
value: [0.73854895 0.43082022 0.27216553 0.49099025 0.43082022 0.57735027
0.43082022 0.7 0.84852814 0.3721042 ]
mean value: 0.5292147991546989
key: train_mcc
value: [0.85869998 0.87269455 0.88951136 0.88951136 0.87406606 0.88951136
0.92202167 0.89249493 0.87565664 0.90622006]
mean value: 0.8870387975267908
key: test_accuracy
value: [0.86666667 0.73333333 0.66666667 0.73333333 0.73333333 0.8
0.73333333 0.86666667 0.92857143 0.71428571]
mean value: 0.7776190476190477
key: train_accuracy
value: [0.93233083 0.93984962 0.94736842 0.94736842 0.93984962 0.94736842
0.96240602 0.94736842 0.94029851 0.95522388]
mean value: 0.9459432162495791
key: test_fscore
value: [0.9 0.8 0.76190476 0.75 0.8 0.84210526
0.8 0.9 0.94736842 0.81818182]
mean value: 0.8319560264297107
key: train_fscore
value: [0.94736842 0.95238095 0.95857988 0.95857988 0.95294118 0.95857988
0.9704142 0.95857988 0.95294118 0.96428571]
mean value: 0.9574651168471126
key: test_precision
value: [0.81818182 0.72727273 0.66666667 0.85714286 0.72727273 0.8
0.72727273 0.9 0.9 0.69230769]
mean value: 0.7816117216117217
key: train_precision
value: [0.91011236 0.93023256 0.93103448 0.93103448 0.92045455 0.93103448
0.94252874 0.92045455 0.92045455 0.94186047]
mean value: 0.9279201203078058
key: test_recall
value: [1. 0.88888889 0.88888889 0.66666667 0.88888889 0.88888889
0.88888889 0.9 1. 1. ]
mean value: 0.9011111111111111
key: train_recall
value: [0.98780488 0.97560976 0.98780488 0.98780488 0.98780488 0.98780488
1. 1. 0.98780488 0.98780488]
mean value: 0.9890243902439024
key: test_roc_auc
value: [0.83333333 0.69444444 0.61111111 0.75 0.69444444 0.77777778
0.69444444 0.85 0.9 0.6 ]
mean value: 0.7405555555555555
key: train_roc_auc
value: [0.91547107 0.92898135 0.93507891 0.93507891 0.92527499 0.93507891
0.95098039 0.93269231 0.92659475 0.94582552]
mean value: 0.9331057094507597
key: test_jcc
value: [0.81818182 0.66666667 0.61538462 0.6 0.66666667 0.72727273
0.66666667 0.81818182 0.9 0.69230769]
mean value: 0.7171328671328672
key: train_jcc
value: [0.9 0.90909091 0.92045455 0.92045455 0.91011236 0.92045455
0.94252874 0.92045455 0.91011236 0.93103448]
mean value: 0.9184697028401019
MCC on Blind test: 0.71
Accuracy on Blind test: 0.86
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01027036 0.01250505 0.01518893 0.01096773 0.00941038 0.01338744
0.00863194 0.01007581 0.01536703 0.01049685]
mean value: 0.01163015365600586
key: score_time
value: [0.01019716 0.0142858 0.01393628 0.01013446 0.00880861 0.00876451
0.00855732 0.01425838 0.01521611 0.01059222]
mean value: 0.011475086212158203
key: test_mcc
value: [ 0.44444444 0.32732684 0. 0. 0. 0.57735027
-0.28867513 0. 0.06666667 0.33734954]
mean value: 0.14644626235299057
key: train_mcc
value: [0.43769978 0.45554586 0.45215696 0.49115256 0.4455592 0.44919673
0.49718111 0.42789983 0.44816116 0.45628689]
mean value: 0.4560840076371738
key: test_accuracy
value: [0.73333333 0.66666667 0.53333333 0.46666667 0.53333333 0.8
0.4 0.53333333 0.57142857 0.71428571]
mean value: 0.5952380952380952
key: train_accuracy
value: [0.73684211 0.7443609 0.7443609 0.7593985 0.73684211 0.7443609
0.76691729 0.72932331 0.73880597 0.73880597]
mean value: 0.7440017955336101
key: test_fscore
value: [0.77777778 0.70588235 0.63157895 0.42857143 0.63157895 0.84210526
0.52631579 0.63157895 0.66666667 0.8 ]
mean value: 0.6642056120693891
key: train_fscore
value: [0.79041916 0.79518072 0.79761905 0.80487805 0.78527607 0.8
0.81871345 0.7804878 0.78787879 0.7826087 ]
mean value: 0.7943061793288788
key: test_precision
value: [0.77777778 0.75 0.6 0.6 0.6 0.8
0.5 0.66666667 0.66666667 0.72727273]
mean value: 0.6688383838383838
key: train_precision
value: [0.77647059 0.78571429 0.77906977 0.80487805 0.79012346 0.77272727
0.78651685 0.77108434 0.78313253 0.79746835]
mean value: 0.7847185495522168
key: test_recall
value: [0.77777778 0.66666667 0.66666667 0.33333333 0.66666667 0.88888889
0.55555556 0.6 0.66666667 0.88888889]
mean value: 0.6711111111111111
key: train_recall
value: [0.80487805 0.80487805 0.81707317 0.80487805 0.7804878 0.82926829
0.85365854 0.79012346 0.79268293 0.76829268]
mean value: 0.8046221017765733
key: test_roc_auc
value: [0.72222222 0.66666667 0.5 0.5 0.5 0.77777778
0.36111111 0.5 0.53333333 0.64444444]
mean value: 0.5705555555555556
key: train_roc_auc
value: [0.71616451 0.72596844 0.72226208 0.74557628 0.72357724 0.71855571
0.74055476 0.71236942 0.72326454 0.73030019]
mean value: 0.7258593163483168
key: test_jcc
value: [0.63636364 0.54545455 0.46153846 0.27272727 0.46153846 0.72727273
0.35714286 0.46153846 0.5 0.66666667]
mean value: 0.5090243090243091
key: train_jcc
value: [0.65346535 0.66 0.66336634 0.67346939 0.64646465 0.66666667
0.69306931 0.64 0.65 0.64285714]
mean value: 0.6589358833842568
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16157722 0.43932223 0.04518628 0.13460135 0.04375291 0.04577804
0.04375458 0.04748416 0.05378747 0.05692935]
mean value: 0.10721735954284668
key: score_time
value: [0.01405382 0.01124883 0.01167321 0.01080108 0.01058769 0.01020837
0.01086092 0.01022172 0.01022148 0.01110244]
mean value: 0.011097955703735351
key: test_mcc
value: [0.73854895 0.57735027 0.87287156 0.8660254 0.8660254 0.57735027
0.72222222 1. 1. 0.51854497]
mean value: 0.7738939047860451
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.8 0.93333333 0.93333333 0.93333333 0.8
0.86666667 1. 1. 0.78571429]
mean value: 0.891904761904762
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.84210526 0.94117647 0.94736842 0.94736842 0.84210526
0.88888889 1. 1. 0.84210526]
mean value: 0.9151117991056071
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.8 1. 0.9 0.9 0.8
0.88888889 1. 1. 0.8 ]
mean value: 0.8907070707070708
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 0.88888889 1. 1. 0.88888889
0.88888889 1. 1. 0.88888889]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.77777778 0.94444444 0.91666667 0.91666667 0.77777778
0.86111111 1. 1. 0.74444444]
mean value: 0.8772222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.72727273 0.88888889 0.9 0.9 0.72727273
0.8 1. 1. 0.72727273]
mean value: 0.8488888888888889
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.83
Accuracy on Blind test: 0.92
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04252434 0.05116844 0.05966687 0.05816221 0.04887724 0.05299878
0.0588336 0.0550344 0.06454182 0.05411196]
mean value: 0.054591965675354
key: score_time
value: [0.01209831 0.0214119 0.02118158 0.02093101 0.02505398 0.02150726
0.02001572 0.02202773 0.02196813 0.02339149]
mean value: 0.020958709716796874
key: test_mcc
value: [ 0.05455447 0.16666667 0.32732684 0.66666667 0.6000992 0.72222222
0.32732684 -0.53300179 0.74535599 0.84852814]
mean value: 0.39257452360062706
key: train_mcc
value: [0.98416472 1. 1. 1. 0.98416472 1.
1. 0.98428077 0.98435397 1. ]
mean value: 0.9936964183102901
key: test_accuracy
value: [0.53333333 0.6 0.66666667 0.8 0.8 0.86666667
0.66666667 0.2 0.85714286 0.92857143]
mean value: 0.6919047619047619
key: train_accuracy
value: [0.9924812 1. 1. 1. 0.9924812 1.
1. 0.9924812 0.99253731 1. ]
mean value: 0.9969980922455393
key: test_fscore
value: [0.58823529 0.66666667 0.70588235 0.8 0.82352941 0.88888889
0.70588235 0.14285714 0.875 0.94736842]
mean value: 0.7144310531230036
key: train_fscore
value: [0.99393939 1. 1. 1. 0.99393939 1.
1. 0.99386503 0.99393939 1. ]
mean value: 0.9975683212493028
key: test_precision
value: [0.625 0.66666667 0.75 1. 0.875 0.88888889
0.75 0.25 1. 0.9 ]
mean value: 0.7705555555555555
key: train_precision
value: [0.98795181 1. 1. 1. 0.98795181 1.
1. 0.98780488 0.98795181 1. ]
mean value: 0.9951660299735527
key: test_recall
value: [0.55555556 0.66666667 0.66666667 0.66666667 0.77777778 0.88888889
0.66666667 0.1 0.77777778 1. ]
mean value: 0.6766666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.52777778 0.58333333 0.66666667 0.83333333 0.80555556 0.86111111
0.66666667 0.25 0.88888889 0.9 ]
mean value: 0.6983333333333334
key: train_roc_auc
value: [0.99019608 1. 1. 1. 0.99019608 1.
1. 0.99038462 0.99038462 1. ]
mean value: 0.9961161387631976
key: test_jcc
value: [0.41666667 0.5 0.54545455 0.66666667 0.7 0.8
0.54545455 0.07692308 0.77777778 0.9 ]
mean value: 0.5928943278943279
key: train_jcc
value: [0.98795181 1. 1. 1. 0.98795181 1.
1. 0.98780488 0.98795181 1. ]
mean value: 0.9951660299735527
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01957345 0.00936222 0.00870514 0.00975537 0.00946307 0.00951624
0.00876665 0.01013446 0.00984979 0.01234937]
mean value: 0.010747575759887695
key: score_time
value: [0.00978994 0.00869393 0.00832486 0.00907922 0.00906873 0.00860548
0.00923014 0.01038742 0.00961089 0.01400924]
mean value: 0.00967998504638672
key: test_mcc
value: [ 0.28867513 -0.06804138 0.48038446 0. 0.28867513 0.61237244
0.27216553 0.18898224 0.33734954 0.3721042 ]
mean value: 0.27726672942748454
key: train_mcc
value: [0.42695156 0.40914219 0.41362409 0.40914219 0.44310968 0.46233819
0.40761269 0.51648972 0.41757429 0.36234681]
mean value: 0.4268331407675872
key: test_accuracy
value: [0.66666667 0.53333333 0.73333333 0.53333333 0.6 0.8
0.66666667 0.6 0.71428571 0.71428571]
mean value: 0.6561904761904762
key: train_accuracy
value: [0.73684211 0.72932331 0.72932331 0.72932331 0.7443609 0.7518797
0.72932331 0.77443609 0.73134328 0.70895522]
mean value: 0.7365110537537874
key: test_fscore
value: [0.73684211 0.66666667 0.81818182 0.63157895 0.57142857 0.85714286
0.76190476 0.66666667 0.8 0.81818182]
mean value: 0.7328594212804739
key: train_fscore
value: [0.8 0.79545455 0.79069767 0.79545455 0.80681818 0.82162162
0.79775281 0.82954545 0.79545455 0.78688525]
mean value: 0.8019684623657902
key: test_precision
value: [0.7 0.58333333 0.69230769 0.6 0.8 0.75
0.66666667 0.75 0.72727273 0.69230769]
mean value: 0.6961888111888112
key: train_precision
value: [0.75268817 0.74468085 0.75555556 0.74468085 0.75531915 0.73786408
0.73958333 0.76842105 0.74468085 0.71287129]
mean value: 0.7456345180489754
key: test_recall
value: [0.77777778 0.77777778 1. 0.66666667 0.44444444 1.
0.88888889 0.6 0.88888889 1. ]
mean value: 0.8044444444444444
key: train_recall
value: [0.85365854 0.85365854 0.82926829 0.85365854 0.86585366 0.92682927
0.86585366 0.90123457 0.85365854 0.87804878]
mean value: 0.8681722372779284
key: test_roc_auc
value: [0.63888889 0.47222222 0.66666667 0.5 0.63888889 0.75
0.61111111 0.6 0.64444444 0.6 ]
mean value: 0.6122222222222222
key: train_roc_auc
value: [0.70133907 0.69153515 0.69894787 0.69153515 0.70743663 0.69870875
0.68782879 0.73907882 0.69606004 0.66017824]
mean value: 0.6972648516706383
key: test_jcc
value: [0.58333333 0.5 0.69230769 0.46153846 0.4 0.75
0.61538462 0.5 0.66666667 0.69230769]
mean value: 0.5861538461538461
key: train_jcc
value: [0.66666667 0.66037736 0.65384615 0.66037736 0.67619048 0.69724771
0.6635514 0.70873786 0.66037736 0.64864865]
mean value: 0.6696020993192491
MCC on Blind test: 0.22
Accuracy on Blind test: 0.65
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01204324 0.01354265 0.01453185 0.01378798 0.01325369 0.01510215
0.01632333 0.01447392 0.01603985 0.01390338]
mean value: 0.014300203323364258
key: score_time
value: [0.00963211 0.01172948 0.01188231 0.01177025 0.01262784 0.01211071
0.01205111 0.01195526 0.01205778 0.02415824]
mean value: 0.01299750804901123
key: test_mcc
value: [ 0.49099025 -0.21821789 0.28867513 0.38888889 0. 0.38888889
0.44444444 0.35355339 0.64549722 0.54772256]
mean value: 0.33304428920783685
key: train_mcc
value: [0.7481685 0.53345478 0.92239408 0.68894951 0.41644772 0.83129833
0.96819703 0.57295971 0.83282505 0.76522585]
mean value: 0.7279920567315256
key: test_accuracy
value: [0.73333333 0.53333333 0.66666667 0.66666667 0.4 0.66666667
0.73333333 0.53333333 0.78571429 0.78571429]
mean value: 0.6504761904761904
key: train_accuracy
value: [0.85714286 0.76691729 0.96240602 0.84210526 0.60150376 0.90977444
0.98496241 0.72932331 0.91044776 0.8880597 ]
mean value: 0.8452642801032432
key: test_fscore
value: [0.75 0.69565217 0.73684211 0.66666667 0. 0.66666667
0.77777778 0.46153846 0.8 0.85714286]
mean value: 0.6412286708968631
key: train_fscore
value: [0.86896552 0.84102564 0.9689441 0.8627451 0.52252252 0.92105263
0.98780488 0.71428571 0.92105263 0.90797546]
mean value: 0.851637419382273
key: test_precision
value: [0.85714286 0.57142857 0.7 0.83333333 0. 0.83333333
0.77777778 1. 1. 0.75 ]
mean value: 0.7323015873015873
key: train_precision
value: [1. 0.72566372 0.98734177 0.92957746 1. 1.
0.98780488 1. 1. 0.91358025]
mean value: 0.9543968078717151
key: test_recall
value: [0.66666667 0.88888889 0.77777778 0.55555556 0. 0.55555556
0.77777778 0.3 0.66666667 1. ]
mean value: 0.6188888888888889
key: train_recall
value: [0.76829268 1. 0.95121951 0.80487805 0.35365854 0.85365854
0.98780488 0.55555556 0.85365854 0.90243902]
mean value: 0.8031165311653117
key: test_roc_auc
value: [0.75 0.44444444 0.63888889 0.69444444 0.5 0.69444444
0.72222222 0.65 0.83333333 0.7 ]
mean value: 0.6627777777777778
key: train_roc_auc
value: [0.88414634 0.69607843 0.96580583 0.85341942 0.67682927 0.92682927
0.98409852 0.77777778 0.92682927 0.88391182]
mean value: 0.8575725943911022
key: test_jcc
value: [0.6 0.53333333 0.58333333 0.5 0. 0.5
0.63636364 0.3 0.66666667 0.75 ]
mean value: 0.506969696969697
key: train_jcc
value: [0.76829268 0.72566372 0.93975904 0.75862069 0.35365854 0.85365854
0.97590361 0.55555556 0.85365854 0.83146067]
mean value: 0.7616231579467527
MCC on Blind test: 0.49
Accuracy on Blind test: 0.76
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01271224 0.0140152 0.01353335 0.01386046 0.0133822 0.01415133
0.01415229 0.01478529 0.01401305 0.01370788]
mean value: 0.013831329345703126
key: score_time
value: [0.00936437 0.01232767 0.01146793 0.01134229 0.01132178 0.0113194
0.01169276 0.01195049 0.01243901 0.01177311]
mean value: 0.011499881744384766
key: test_mcc
value: [ 0.32732684 0.27216553 0.16666667 -0.08006408 0.61237244 0.72222222
0.28867513 0.5547002 1. 0.54772256]
mean value: 0.4411787498337245
key: train_mcc
value: [0.84393984 0.96845676 0.83443276 0.42561819 0.73349852 0.81675202
0.98428077 0.73141304 0.92326075 0.82277852]
mean value: 0.808443116758431
key: test_accuracy
value: [0.66666667 0.66666667 0.6 0.4 0.8 0.86666667
0.66666667 0.8 1. 0.78571429]
mean value: 0.7252380952380952
key: train_accuracy
value: [0.91729323 0.98496241 0.91729323 0.60902256 0.87218045 0.90977444
0.9924812 0.86466165 0.96268657 0.91044776]
mean value: 0.894080350129054
key: test_fscore
value: [0.70588235 0.76190476 0.66666667 0.18181818 0.85714286 0.88888889
0.73684211 0.86956522 1. 0.85714286]
mean value: 0.7525853889159853
key: train_fscore
value: [0.92810458 0.98795181 0.92993631 0.53571429 0.9039548 0.93181818
0.99386503 0.9 0.9689441 0.92307692]
mean value: 0.9003366011047804
key: test_precision
value: [0.75 0.66666667 0.66666667 0.5 0.75 0.88888889
0.7 0.76923077 1. 0.75 ]
mean value: 0.7441452991452991
key: train_precision
value: [1. 0.97619048 0.97333333 1. 0.84210526 0.87234043
1. 0.81818182 0.98734177 0.97297297]
mean value: 0.9442466061520309
key: test_recall
value: [0.66666667 0.88888889 0.66666667 0.11111111 1. 0.88888889
0.77777778 1. 1. 1. ]
mean value: 0.7999999999999999
key: train_recall
value: [0.86585366 1. 0.8902439 0.36585366 0.97560976 1.
0.98780488 1. 0.95121951 0.87804878]
mean value: 0.8914634146341464
key: test_roc_auc
value: [0.66666667 0.61111111 0.58333333 0.47222222 0.75 0.86111111
0.63888889 0.7 1. 0.7 ]
mean value: 0.6983333333333334
key: train_roc_auc
value: [0.93292683 0.98039216 0.92551411 0.68292683 0.84074605 0.88235294
0.99390244 0.82692308 0.96599437 0.91979362]
mean value: 0.8951472427620204
key: test_jcc
value: [0.54545455 0.61538462 0.5 0.1 0.75 0.8
0.58333333 0.76923077 1. 0.75 ]
mean value: 0.6413403263403263
key: train_jcc
value: [0.86585366 0.97619048 0.86904762 0.36585366 0.82474227 0.87234043
0.98780488 0.81818182 0.93975904 0.85714286]
mean value: 0.8376916695402452
MCC on Blind test: 0.46
Accuracy on Blind test: 0.7
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10654593 0.12945461 0.13125443 0.12465715 0.10133481 0.15334868
0.11432934 0.11598229 0.12642503 0.13426089]
mean value: 0.12375931739807129
key: score_time
value: [0.01605654 0.01604772 0.01675987 0.01631927 0.02109814 0.02507401
0.01946521 0.0151999 0.01829481 0.01699734]
mean value: 0.018131279945373537
key: test_mcc
value: [0.73854895 0.57735027 1. 0.8660254 0.6000992 0.57735027
0.72222222 0.70710678 1. 0.84852814]
mean value: 0.7637231227021293
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.8 1. 0.93333333 0.8 0.8
0.86666667 0.86666667 1. 0.92857143]
mean value: 0.8861904761904762
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.84210526 1. 0.94736842 0.82352941 0.84210526
0.88888889 0.90909091 1. 0.94736842]
mean value: 0.9100456578165557
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.8 1. 0.9 0.875 0.8
0.88888889 0.83333333 1. 0.9 ]
mean value: 0.881540404040404
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 1. 1. 0.77777778 0.88888889
0.88888889 1. 1. 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.77777778 1. 0.91666667 0.80555556 0.77777778
0.86111111 0.8 1. 0.9 ]
mean value: 0.8672222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.72727273 1. 0.9 0.7 0.72727273
0.8 0.83333333 1. 0.9 ]
mean value: 0.8406060606060606
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0503726 0.02812481 0.04068756 0.03008389 0.02571535 0.06941533
0.03493547 0.04549479 0.02969337 0.03361511]
mean value: 0.03881382942199707
key: score_time
value: [0.01990795 0.01710773 0.02853751 0.02316236 0.02038765 0.018929
0.02127457 0.02027416 0.0221498 0.02187991]
mean value: 0.021361064910888673
key: test_mcc
value: [0.73854895 0.8660254 0.87287156 0.72222222 0.8660254 0.73854895
0.72222222 0.7 0.74535599 0.84852814]
mean value: 0.7820348834633071
key: train_mcc
value: [1. 0.98428077 0.96891398 1. 0.98416472 1.
1. 0.96869441 1. 0.96857411]
mean value: 0.9874627986897856
key: test_accuracy
value: [0.86666667 0.93333333 0.93333333 0.86666667 0.93333333 0.86666667
0.86666667 0.86666667 0.85714286 0.92857143]
mean value: 0.891904761904762
key: train_accuracy
value: [1. 0.9924812 0.98496241 1. 0.9924812 1.
1. 0.98496241 1. 0.98507463]
mean value: 0.9939961844910784
key: test_fscore
value: [0.9 0.94736842 0.94117647 0.88888889 0.94736842 0.9
0.88888889 0.9 0.875 0.94736842]
mean value: 0.9136059511523908
key: train_fscore
value: [1. 0.99386503 0.98765432 1. 0.99393939 1.
1. 0.98780488 1. 0.98780488]
mean value: 0.9951068501699456
key: test_precision
value: [0.81818182 0.9 1. 0.88888889 0.9 0.81818182
0.88888889 0.9 1. 0.9 ]
mean value: 0.9014141414141414
key: train_precision
value: [1. 1. 1. 1. 0.98795181 1.
1. 0.97590361 1. 0.98780488]
mean value: 0.9951660299735527
key: test_recall
value: [1. 1. 0.88888889 0.88888889 1. 1.
0.88888889 0.9 0.77777778 1. ]
mean value: 0.9344444444444444
key: train_recall
value: [1. 0.98780488 0.97560976 1. 1. 1.
1. 1. 1. 0.98780488]
mean value: 0.9951219512195122
key: test_roc_auc
value: [0.83333333 0.91666667 0.94444444 0.86111111 0.91666667 0.83333333
0.86111111 0.85 0.88888889 0.9 ]
mean value: 0.8805555555555555
key: train_roc_auc
value: [1. 0.99390244 0.98780488 1. 0.99019608 1.
1. 0.98076923 1. 0.98428705]
mean value: 0.993695968068278
key: test_jcc
value: [0.81818182 0.9 0.88888889 0.8 0.9 0.81818182
0.8 0.81818182 0.77777778 0.9 ]
mean value: 0.8421212121212122
key: train_jcc
value: [1. 0.98780488 0.97560976 1. 0.98795181 1.
1. 0.97590361 1. 0.97590361]
mean value: 0.990317367029092
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.11155105 0.13867617 0.09597683 0.06546187 0.05309463 0.06110954
0.06886148 0.10508013 0.0533936 0.06201959]
mean value: 0.08152248859405517
key: score_time
value: [0.02223778 0.02060914 0.0354867 0.03397918 0.03515315 0.0243063
0.02288532 0.02245426 0.02287269 0.0232141 ]
mean value: 0.02631986141204834
key: test_mcc
value: [ 0.43082022 -0.06804138 -0.21821789 -0.28867513 0. 0.72222222
-0.28867513 0.13867505 0.06666667 0.70064905]
mean value: 0.1195423664948636
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73333333 0.53333333 0.4 0.4 0.53333333 0.86666667
0.4 0.66666667 0.57142857 0.85714286]
mean value: 0.5961904761904762
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.66666667 0.47058824 0.52631579 0.63157895 0.88888889
0.52631579 0.7826087 0.66666667 0.9 ]
mean value: 0.6859629679484303
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.58333333 0.5 0.5 0.6 0.88888889
0.5 0.69230769 0.66666667 0.81818182]
mean value: 0.6476651126651126
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.77777778 0.44444444 0.55555556 0.66666667 0.88888889
0.55555556 0.9 0.66666667 1. ]
mean value: 0.7344444444444445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.69444444 0.47222222 0.38888889 0.36111111 0.5 0.86111111
0.36111111 0.55 0.53333333 0.8 ]
mean value: 0.5522222222222223
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.5 0.30769231 0.35714286 0.46153846 0.8
0.35714286 0.64285714 0.5 0.81818182]
mean value: 0.5411222111222111
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.02
Accuracy on Blind test: 0.54
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.33926439 0.26802969 0.26510382 0.28069091 0.28672743 0.26764488
0.2655201 0.27123427 0.26848125 0.2639358 ]
mean value: 0.277663254737854
key: score_time
value: [0.01455426 0.00921488 0.00932932 0.01044846 0.00904274 0.00909829
0.00927019 0.00904536 0.00913763 0.00903177]
mean value: 0.009817290306091308
key: test_mcc
value: [0.73854895 0.8660254 1. 0.72222222 0.8660254 0.57735027
0.72222222 0.85280287 1. 0.84852814]
mean value: 0.8193725469925243
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86666667 0.93333333 1. 0.86666667 0.93333333 0.8
0.86666667 0.93333333 1. 0.92857143]
mean value: 0.9128571428571429
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.94736842 1. 0.88888889 0.94736842 0.84210526
0.88888889 0.95238095 1. 0.94736842]
mean value: 0.931436925647452
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.9 1. 0.88888889 0.9 0.8
0.88888889 0.90909091 1. 0.9 ]
mean value: 0.9005050505050505
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 1. 0.88888889
0.88888889 1. 1. 1. ]
mean value: 0.9666666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.91666667 1. 0.86111111 0.91666667 0.77777778
0.86111111 0.9 1. 0.9 ]
mean value: 0.8966666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.9 1. 0.8 0.9 0.72727273
0.8 0.90909091 1. 0.9 ]
mean value: 0.8754545454545455
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01630831 0.01763463 0.03990936 0.01766634 0.0183568 0.01870799
0.01855612 0.01905155 0.02915406 0.01773953]
mean value: 0.021308469772338866
key: score_time
value: [0.01214647 0.01199389 0.01224542 0.01331973 0.013767 0.01297832
0.012532 0.01315141 0.01218915 0.01300168]
mean value: 0.012732505798339844
key: test_mcc
value: [-0.06804138 0.48038446 -0.38888889 -0.18463724 0. -0.06804138
0.43082022 -0.35355339 0.33734954 0.33734954]
mean value: 0.05227414853437966
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.53333333 0.73333333 0.33333333 0.46666667 0.53333333 0.53333333
0.73333333 0.46666667 0.71428571 0.71428571]
mean value: 0.5761904761904761
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.81818182 0.44444444 0.6 0.63157895 0.66666667
0.8 0.63636364 0.8 0.8 ]
mean value: 0.6863902179691653
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.58333333 0.69230769 0.44444444 0.54545455 0.6 0.58333333
0.72727273 0.58333333 0.72727273 0.72727273]
mean value: 0.6214024864024864
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 1. 0.44444444 0.66666667 0.66666667 0.77777778
0.88888889 0.7 0.88888889 0.88888889]
mean value: 0.77
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47222222 0.66666667 0.30555556 0.41666667 0.5 0.47222222
0.69444444 0.35 0.64444444 0.64444444]
mean value: 0.5166666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.69230769 0.28571429 0.42857143 0.46153846 0.5
0.66666667 0.46666667 0.66666667 0.66666667]
mean value: 0.5334798534798535
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.14
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.04897857 0.03619099 0.0377605 0.0363791 0.0273664 0.03334212
0.03329372 0.03301764 0.04070854 0.03767133]
mean value: 0.036470890045166016
key: score_time
value: [0.01737165 0.02049923 0.02360916 0.02222991 0.02205873 0.02062511
0.01997638 0.01992583 0.02362394 0.02236056]
mean value: 0.02122805118560791
key: test_mcc
value: [ 0.44444444 -0.06804138 0.16666667 0.49099025 0.73854895 0.6000992
0.44444444 0.53300179 1. 0.54772256]
mean value: 0.4897876919261729
key: train_mcc
value: [0.96845676 0.95286855 0.95286855 0.93739264 0.88847246 0.96845676
0.95223938 0.93688296 0.95281321 0.95281321]
mean value: 0.9463264497946997
key: test_accuracy
value: [0.73333333 0.53333333 0.6 0.73333333 0.86666667 0.8
0.73333333 0.8 1. 0.78571429]
mean value: 0.7585714285714286
key: train_accuracy
value: [0.98496241 0.97744361 0.97744361 0.96992481 0.94736842 0.98496241
0.97744361 0.96992481 0.97761194 0.97761194]
mean value: 0.9744697564807541
key: test_fscore
value: [0.77777778 0.66666667 0.66666667 0.75 0.9 0.82352941
0.77777778 0.85714286 1. 0.85714286]
mean value: 0.8076704014939309
key: train_fscore
value: [0.98795181 0.98203593 0.98203593 0.97619048 0.95808383 0.98795181
0.98181818 0.97560976 0.98181818 0.98181818]
mean value: 0.9795314080823169
key: test_precision
value: [0.77777778 0.58333333 0.66666667 0.85714286 0.81818182 0.875
0.77777778 0.81818182 1. 0.75 ]
mean value: 0.792406204906205
key: train_precision
value: [0.97619048 0.96470588 0.96470588 0.95348837 0.94117647 0.97619048
0.97590361 0.96385542 0.97590361 0.97590361]
mean value: 0.9668023824828335
key: test_recall
value: [0.77777778 0.77777778 0.66666667 0.66666667 1. 0.77777778
0.77777778 0.9 1. 1. ]
mean value: 0.8344444444444444
key: train_recall
value: [1. 1. 1. 1. 0.97560976 1.
0.98780488 0.98765432 0.98780488 0.98780488]
mean value: 0.9926678711231557
key: test_roc_auc
value: [0.72222222 0.47222222 0.58333333 0.75 0.83333333 0.80555556
0.72222222 0.75 1. 0.7 ]
mean value: 0.7338888888888889
key: train_roc_auc
value: [0.98039216 0.97058824 0.97058824 0.96078431 0.93878527 0.98039216
0.9742946 0.96498101 0.97467167 0.97467167]
mean value: 0.969014931036691
key: test_jcc
value: [0.63636364 0.5 0.5 0.6 0.81818182 0.7
0.63636364 0.75 1. 0.75 ]
mean value: 0.6890909090909091
key: train_jcc
value: [0.97619048 0.96470588 0.96470588 0.95348837 0.91954023 0.97619048
0.96428571 0.95238095 0.96428571 0.96428571]
mean value: 0.960005941430301
MCC on Blind test: 0.59
Accuracy on Blind test: 0.81
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.22938943 0.23086452 0.25409341 0.41145706 0.33719325 0.23104048
0.12042832 0.26716518 0.23124266 0.19360614]
mean value: 0.250648045539856
key: score_time
value: [0.02094197 0.02873898 0.02438402 0.02471972 0.01987958 0.02118778
0.0118742 0.0221138 0.01199412 0.02212453]
mean value: 0.020795869827270507
key: test_mcc
value: [0.44444444 0.27216553 0.16666667 0.66666667 0.73854895 0.6000992
0.44444444 0.09449112 0.74535599 0.54772256]
mean value: 0.4720605561480509
key: train_mcc
value: [0.96845676 0.98416472 0.95286855 0.98416472 0.88847246 0.96845676
0.95223938 0.96842355 0.95281321 0.95281321]
mean value: 0.9572873336128294
key: test_accuracy
value: [0.73333333 0.66666667 0.6 0.8 0.86666667 0.8
0.73333333 0.53333333 0.85714286 0.78571429]
mean value: 0.7376190476190476
key: train_accuracy
value: [0.98496241 0.9924812 0.97744361 0.9924812 0.94736842 0.98496241
0.97744361 0.98496241 0.97761194 0.97761194]
mean value: 0.9797329143754909
key: test_fscore
value: [0.77777778 0.76190476 0.66666667 0.8 0.9 0.82352941
0.77777778 0.58823529 0.875 0.85714286]
mean value: 0.7828034547152194
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98795181 0.99393939 0.98203593 0.99393939 0.95808383 0.98795181
0.98181818 0.98765432 0.98181818 0.98181818]
mean value: 0.983701102925786
key: test_precision
value: [0.77777778 0.66666667 0.66666667 1. 0.81818182 0.875
0.77777778 0.71428571 1. 0.75 ]
mean value: 0.8046356421356421
key: train_precision
value: [0.97619048 0.98795181 0.96470588 0.98795181 0.94117647 0.97619048
0.97590361 0.98765432 0.97590361 0.97590361]
mean value: 0.9749532084141108
key: test_recall
value: [0.77777778 0.88888889 0.66666667 0.66666667 1. 0.77777778
0.77777778 0.5 0.77777778 1. ]
mean value: 0.7833333333333333
key: train_recall
value: [1. 1. 1. 1. 0.97560976 1.
0.98780488 0.98765432 0.98780488 0.98780488]
mean value: 0.9926678711231557
key: test_roc_auc
value: [0.72222222 0.61111111 0.58333333 0.83333333 0.83333333 0.80555556
0.72222222 0.55 0.88888889 0.7 ]
mean value: 0.725
key: train_roc_auc
value: [0.98039216 0.99019608 0.97058824 0.99019608 0.93878527 0.98039216
0.9742946 0.98421178 0.97467167 0.97467167]
mean value: 0.9758399687440816
key: test_jcc
value: [0.63636364 0.61538462 0.5 0.66666667 0.81818182 0.7
0.63636364 0.41666667 0.77777778 0.75 ]
mean value: 0.6517404817404817
key: train_jcc
value: [0.97619048 0.98795181 0.96470588 0.98795181 0.91954023 0.97619048
0.96428571 0.97560976 0.96428571 0.96428571]
mean value: 0.9680997578031486
MCC on Blind test: 0.59
Accuracy on Blind test: 0.81
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0408802 0.0270524 0.02707195 0.02973795 0.03253603 0.04366565
0.03498888 0.06585813 0.06901002 0.06005859]
mean value: 0.04308598041534424
key: score_time
value: [0.01190782 0.01165295 0.01165676 0.01165891 0.01165366 0.01169753
0.01182747 0.01180315 0.01178432 0.01181316]
mean value: 0.011745572090148926
key: test_mcc
value: [0.68888889 0.58655573 0.70710678 0.67082039 0.70710678 0.47140452
0.67082039 0.11111111 0.67082039 0.67082039]
mean value: 0.5955455381604875
key: train_mcc
value: [0.86510087 0.85275519 0.86591805 0.89031011 0.85365854 0.89031011
0.87804878 0.89031011 0.86591805 0.87909532]
mean value: 0.8731425135908
key: test_accuracy
value: [0.84210526 0.78947368 0.83333333 0.83333333 0.83333333 0.72222222
0.83333333 0.55555556 0.83333333 0.83333333]
mean value: 0.7909356725146199
key: train_accuracy
value: [0.93251534 0.92638037 0.93292683 0.94512195 0.92682927 0.94512195
0.93902439 0.94512195 0.93292683 0.93902439]
mean value: 0.9364993266497081
key: test_fscore
value: [0.84210526 0.81818182 0.8 0.84210526 0.85714286 0.66666667
0.82352941 0.55555556 0.82352941 0.82352941]
mean value: 0.7852345659156805
key: train_fscore
value: [0.93251534 0.92592593 0.93251534 0.94478528 0.92682927 0.94478528
0.93902439 0.94478528 0.93251534 0.9375 ]
mean value: 0.9361181424953309
key: test_precision
value: [0.8 0.75 1. 0.8 0.75 0.83333333
0.875 0.55555556 0.875 0.875 ]
mean value: 0.8113888888888889
key: train_precision
value: [0.9382716 0.92592593 0.9382716 0.95061728 0.92682927 0.95061728
0.93902439 0.95061728 0.9382716 0.96153846]
mean value: 0.941998471266764
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.55555556
0.77777778 0.55555556 0.77777778 0.77777778]
mean value: 0.7788888888888889
key: train_recall
value: [0.92682927 0.92592593 0.92682927 0.93902439 0.92682927 0.93902439
0.93902439 0.93902439 0.92682927 0.91463415]
mean value: 0.9303974706413731
key: test_roc_auc
value: [0.84444444 0.78333333 0.83333333 0.83333333 0.83333333 0.72222222
0.83333333 0.55555556 0.83333333 0.83333333]
mean value: 0.7905555555555556
key: train_roc_auc
value: [0.93255044 0.9263776 0.93292683 0.94512195 0.92682927 0.94512195
0.93902439 0.94512195 0.93292683 0.93902439]
mean value: 0.9365025594700391
key: test_jcc
value: [0.72727273 0.69230769 0.66666667 0.72727273 0.75 0.5
0.7 0.38461538 0.7 0.7 ]
mean value: 0.6548135198135198
key: train_jcc
value: [0.87356322 0.86206897 0.87356322 0.89534884 0.86363636 0.89534884
0.88505747 0.89534884 0.87356322 0.88235294]
mean value: 0.8799851908394765
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.8908112 0.95630956 1.52429509 1.28793001 1.42368722 1.88998628
0.99083161 1.84427047 1.74302101 2.14582276]
mean value: 1.5696965217590333
key: score_time
value: [0.01293707 0.01447511 0.01205897 0.01240826 0.01333857 0.01755524
0.01313376 0.0169878 0.02549314 0.01345563]
mean value: 0.015184354782104493
key: test_mcc
value: [0.78888889 0.68543653 0.77777778 0.67082039 0.70710678 0.56980288
0.47140452 0.34188173 0.56980288 0.89442719]
mean value: 0.6477349573707021
key: train_mcc
value: [1. 1. 1. 0.95121951 1. 1.
1. 1. 1. 1. ]
mean value: 0.9951219512195122
key: test_accuracy
value: [0.89473684 0.84210526 0.88888889 0.83333333 0.83333333 0.77777778
0.72222222 0.66666667 0.77777778 0.94444444]
mean value: 0.8181286549707603
key: train_accuracy
value: [1. 1. 1. 0.97560976 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_fscore
value: [0.88888889 0.85714286 0.88888889 0.84210526 0.85714286 0.75
0.66666667 0.625 0.75 0.94736842]
mean value: 0.8073203842940685
key: train_fscore
value: [1. 1. 1. 0.97560976 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_precision
value: [0.88888889 0.81818182 0.88888889 0.8 0.75 0.85714286
0.83333333 0.71428571 0.85714286 0.9 ]
mean value: 0.8307864357864357
key: train_precision
value: [1. 1. 1. 0.97560976 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_recall
value: [0.88888889 0.9 0.88888889 0.88888889 1. 0.66666667
0.55555556 0.55555556 0.66666667 1. ]
mean value: 0.8011111111111111
key: train_recall
value: [1. 1. 1. 0.97560976 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_roc_auc
value: [0.89444444 0.83888889 0.88888889 0.83333333 0.83333333 0.77777778
0.72222222 0.66666667 0.77777778 0.94444444]
mean value: 0.8177777777777777
key: train_roc_auc
value: [1. 1. 1. 0.97560976 1. 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_jcc
value: [0.8 0.75 0.8 0.72727273 0.75 0.6
0.5 0.45454545 0.6 0.9 ]
mean value: 0.6881818181818182
key: train_jcc
value: [1. 1. 1. 0.95238095 1. 1.
1. 1. 1. 1. ]
mean value: 0.9952380952380953
MCC on Blind test: 0.66
Accuracy on Blind test: 0.84
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0283339 0.01366544 0.01121616 0.00976849 0.01593399 0.01254296
0.01012063 0.0107913 0.01603723 0.01076508]
mean value: 0.01391751766204834
key: score_time
value: [0.01344252 0.01378036 0.00989604 0.00934839 0.01519752 0.01089883
0.00986218 0.01424837 0.0136404 0.00985265]
mean value: 0.012016725540161134
key: test_mcc
value: [0.28752732 0.01807754 0.4472136 0.34188173 0. 0.70710678
0.34188173 0.1490712 0.34188173 0.53452248]
mean value: 0.3169164101692284
key: train_mcc
value: [0.47805638 0.42302501 0.55048188 0.57527066 0.51116565 0.57282438
0.42305348 0.52414242 0.47172818 0.58760938]
mean value: 0.5117357414354574
key: test_accuracy
value: [0.63157895 0.52631579 0.66666667 0.66666667 0.5 0.83333333
0.66666667 0.55555556 0.66666667 0.72222222]
mean value: 0.6435672514619883
key: train_accuracy
value: [0.72392638 0.65030675 0.74390244 0.78658537 0.7195122 0.76219512
0.67682927 0.74390244 0.7195122 0.76219512]
mean value: 0.7288867275175819
key: test_fscore
value: [0.66666667 0.66666667 0.75 0.7 0.64 0.85714286
0.7 0.66666667 0.7 0.7826087 ]
mean value: 0.7129751552795032
key: train_fscore
value: [0.76683938 0.73972603 0.79207921 0.79532164 0.77669903 0.80203046
0.74641148 0.78350515 0.7628866 0.80597015]
mean value: 0.777146912204694
key: test_precision
value: [0.58333333 0.52941176 0.6 0.63636364 0.5 0.75
0.63636364 0.53333333 0.63636364 0.64285714]
mean value: 0.6048026483320601
key: train_precision
value: [0.66666667 0.58695652 0.66666667 0.76404494 0.64516129 0.68695652
0.61417323 0.67857143 0.66071429 0.68067227]
mean value: 0.6650583822494134
key: test_recall
value: [0.77777778 0.9 1. 0.77777778 0.88888889 1.
0.77777778 0.88888889 0.77777778 1. ]
mean value: 0.8788888888888888
key: train_recall
value: [0.90243902 1. 0.97560976 0.82926829 0.97560976 0.96341463
0.95121951 0.92682927 0.90243902 0.98780488]
mean value: 0.9414634146341463
key: test_roc_auc
value: [0.63888889 0.50555556 0.66666667 0.66666667 0.5 0.83333333
0.66666667 0.55555556 0.66666667 0.72222222]
mean value: 0.6422222222222222
key: train_roc_auc
value: [0.72282445 0.65243902 0.74390244 0.78658537 0.7195122 0.76219512
0.67682927 0.74390244 0.7195122 0.76219512]
mean value: 0.7289897621198435
key: test_jcc
value: [0.5 0.5 0.6 0.53846154 0.47058824 0.75
0.53846154 0.5 0.53846154 0.64285714]
mean value: 0.5578829993535875
key: train_jcc
value: [0.62184874 0.58695652 0.6557377 0.66019417 0.63492063 0.66949153
0.59541985 0.6440678 0.61666667 0.675 ]
mean value: 0.6360303611859688
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01672673 0.01119757 0.01112366 0.01720643 0.01067734 0.01029587
0.01684093 0.01211858 0.01282406 0.01079941]
mean value: 0.012981057167053223
key: score_time
value: [0.01407647 0.01021147 0.01025724 0.01490736 0.01006126 0.0094769
0.01515317 0.01026034 0.0116756 0.01325488]
mean value: 0.011933469772338867
key: test_mcc
value: [0.25844328 0.16854997 0.47140452 0.4472136 0.34188173 0.1490712
0.47140452 0.12403473 0.34188173 0.3721042 ]
mean value: 0.3145989478923389
key: train_mcc
value: [0.49487065 0.53197363 0.49507377 0.52223297 0.52223297 0.553295
0.525 0.55060372 0.525 0.49938477]
mean value: 0.5219667468393889
key: test_accuracy
value: [0.63157895 0.57894737 0.72222222 0.72222222 0.66666667 0.55555556
0.72222222 0.55555556 0.66666667 0.66666667]
mean value: 0.6488304093567251
key: train_accuracy
value: [0.74233129 0.75460123 0.74390244 0.75609756 0.75609756 0.76829268
0.75609756 0.76219512 0.75609756 0.73780488]
mean value: 0.7533517881191082
key: test_fscore
value: [0.58823529 0.55555556 0.66666667 0.70588235 0.625 0.33333333
0.66666667 0.42857143 0.625 0.57142857]
mean value: 0.5766339869281046
key: train_fscore
value: [0.71621622 0.71014493 0.72 0.72972973 0.72972973 0.73611111
0.7260274 0.71942446 0.7260274 0.69064748]
mean value: 0.720405845128961
key: test_precision
value: [0.625 0.625 0.83333333 0.75 0.71428571 0.66666667
0.83333333 0.6 0.71428571 0.8 ]
mean value: 0.7161904761904762
key: train_precision
value: [0.8030303 0.85964912 0.79411765 0.81818182 0.81818182 0.85483871
0.828125 0.87719298 0.828125 0.84210526]
mean value: 0.8323547664551235
key: test_recall
value: [0.55555556 0.5 0.55555556 0.66666667 0.55555556 0.22222222
0.55555556 0.33333333 0.55555556 0.44444444]
mean value: 0.49444444444444446
key: train_recall
value: [0.64634146 0.60493827 0.65853659 0.65853659 0.65853659 0.64634146
0.64634146 0.6097561 0.64634146 0.58536585]
mean value: 0.6361035832580548
key: test_roc_auc
value: [0.62777778 0.58333333 0.72222222 0.72222222 0.66666667 0.55555556
0.72222222 0.55555556 0.66666667 0.66666667]
mean value: 0.6488888888888888
key: train_roc_auc
value: [0.74292382 0.75368865 0.74390244 0.75609756 0.75609756 0.76829268
0.75609756 0.76219512 0.75609756 0.73780488]
mean value: 0.753319783197832
key: test_jcc
value: [0.41666667 0.38461538 0.5 0.54545455 0.45454545 0.2
0.5 0.27272727 0.45454545 0.4 ]
mean value: 0.4128554778554778
key: train_jcc
value: [0.55789474 0.5505618 0.5625 0.57446809 0.57446809 0.58241758
0.56989247 0.56179775 0.56989247 0.52747253]
mean value: 0.5631365513743338
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01319885 0.01149154 0.01570868 0.01301765 0.01050162 0.01313496
0.00960803 0.00898933 0.01465893 0.00987673]
mean value: 0.012018632888793946
key: score_time
value: [0.01682305 0.01638246 0.02695346 0.01870203 0.01510835 0.01587963
0.01642799 0.01527309 0.02283025 0.01292133]
mean value: 0.017730164527893066
key: test_mcc
value: [ 0.25844328 0.06900656 -0.11396058 0.34188173 0.33333333 0.
0.11396058 0. 0.33333333 0.4472136 ]
mean value: 0.17832118280906833
key: train_mcc
value: [0.48871836 0.51136091 0.50093211 0.46563593 0.48377268 0.46563593
0.4539621 0.47994775 0.47850059 0.43072234]
mean value: 0.4759188688472724
key: test_accuracy
value: [0.63157895 0.52631579 0.44444444 0.66666667 0.66666667 0.5
0.55555556 0.5 0.66666667 0.72222222]
mean value: 0.5880116959064328
key: train_accuracy
value: [0.74233129 0.75460123 0.75 0.73170732 0.73780488 0.73170732
0.72560976 0.73780488 0.73780488 0.71341463]
mean value: 0.7362786173874009
key: test_fscore
value: [0.58823529 0.47058824 0.375 0.625 0.66666667 0.18181818
0.5 0.30769231 0.66666667 0.70588235]
mean value: 0.5087549705196764
key: train_fscore
value: [0.72727273 0.74025974 0.74213836 0.71794872 0.7114094 0.71794872
0.70967742 0.71895425 0.72258065 0.69281046]
mean value: 0.7201000434581415
key: test_precision
value: [0.625 0.57142857 0.42857143 0.71428571 0.66666667 0.5
0.57142857 0.5 0.66666667 0.75 ]
mean value: 0.5994047619047619
key: train_precision
value: [0.77777778 0.78082192 0.76623377 0.75675676 0.79104478 0.75675676
0.75342466 0.77464789 0.76712329 0.74647887]
mean value: 0.7671066457221539
key: test_recall
value: [0.55555556 0.4 0.33333333 0.55555556 0.66666667 0.11111111
0.44444444 0.22222222 0.66666667 0.66666667]
mean value: 0.4622222222222222
key: train_recall
value: [0.68292683 0.7037037 0.7195122 0.68292683 0.64634146 0.68292683
0.67073171 0.67073171 0.68292683 0.64634146]
mean value: 0.6789069557362241
key: test_roc_auc
value: [0.62777778 0.53333333 0.44444444 0.66666667 0.66666667 0.5
0.55555556 0.5 0.66666667 0.72222222]
mean value: 0.5883333333333334
key: train_roc_auc
value: [0.74269798 0.75429088 0.75 0.73170732 0.73780488 0.73170732
0.72560976 0.73780488 0.73780488 0.71341463]
mean value: 0.7362842517314063
key: test_jcc
value: [0.41666667 0.30769231 0.23076923 0.45454545 0.5 0.1
0.33333333 0.18181818 0.5 0.54545455]
mean value: 0.35702797202797204
key: train_jcc
value: [0.57142857 0.58762887 0.59 0.56 0.55208333 0.56
0.55 0.56122449 0.56565657 0.53 ]
mean value: 0.562802182619377
MCC on Blind test: -0.01
Accuracy on Blind test: 0.51
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01490951 0.01831555 0.01540542 0.01234603 0.01195741 0.01603913
0.01533437 0.01074004 0.01064444 0.01078343]
mean value: 0.01364753246307373
key: score_time
value: [0.01275778 0.01393151 0.01058316 0.01042771 0.01456451 0.01108956
0.01409864 0.00900912 0.00917506 0.00897956]
mean value: 0.011461663246154784
key: test_mcc
value: [0.38204659 0.03580574 0.56980288 0.77777778 0.47140452 0.3721042
0.55555556 0.34188173 0.55555556 0.47140452]
mean value: 0.45333390783468236
key: train_mcc
value: [0.73580611 0.78000692 0.74440079 0.76972494 0.75812978 0.80583738
0.76880738 0.78072006 0.75812978 0.78141806]
mean value: 0.7682981208235539
key: test_accuracy
value: [0.68421053 0.52631579 0.77777778 0.88888889 0.72222222 0.66666667
0.77777778 0.66666667 0.77777778 0.72222222]
mean value: 0.7210526315789474
key: train_accuracy
value: [0.86503067 0.88957055 0.87195122 0.88414634 0.87804878 0.90243902
0.88414634 0.8902439 0.87804878 0.8902439 ]
mean value: 0.8833869519676791
key: test_fscore
value: [0.7 0.60869565 0.75 0.88888889 0.76190476 0.57142857
0.77777778 0.625 0.77777778 0.66666667]
mean value: 0.7128140096618357
key: train_fscore
value: [0.85714286 0.88607595 0.86956522 0.88050314 0.87341772 0.9
0.88198758 0.89156627 0.87341772 0.8875 ]
mean value: 0.8801176454293306
key: test_precision
value: [0.63636364 0.53846154 0.85714286 0.88888889 0.66666667 0.8
0.77777778 0.71428571 0.77777778 0.83333333]
mean value: 0.7490698190698191
key: train_precision
value: [0.91666667 0.90909091 0.88607595 0.90909091 0.90789474 0.92307692
0.89873418 0.88095238 0.90789474 0.91025641]
mean value: 0.9049733799400688
key: test_recall
value: [0.77777778 0.7 0.66666667 0.88888889 0.88888889 0.44444444
0.77777778 0.55555556 0.77777778 0.55555556]
mean value: 0.7033333333333334
key: train_recall
value: [0.80487805 0.86419753 0.85365854 0.85365854 0.84146341 0.87804878
0.86585366 0.90243902 0.84146341 0.86585366]
mean value: 0.8571514604034929
key: test_roc_auc
value: [0.68888889 0.51666667 0.77777778 0.88888889 0.72222222 0.66666667
0.77777778 0.66666667 0.77777778 0.72222222]
mean value: 0.7205555555555555
key: train_roc_auc
value: [0.86540199 0.88941584 0.87195122 0.88414634 0.87804878 0.90243902
0.88414634 0.8902439 0.87804878 0.8902439 ]
mean value: 0.8834086118638964
key: test_jcc
value: [0.53846154 0.4375 0.6 0.8 0.61538462 0.4
0.63636364 0.45454545 0.63636364 0.5 ]
mean value: 0.5618618881118881
key: train_jcc
value: [0.75 0.79545455 0.76923077 0.78651685 0.7752809 0.81818182
0.78888889 0.80434783 0.7752809 0.79775281]
mean value: 0.7860935308517135
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.69331741 0.64710665 0.66360855 0.86699152 0.70964789 0.75243115
0.78330445 0.70622683 0.71316576 0.76179647]
mean value: 0.729759669303894
key: score_time
value: [0.01211405 0.01341152 0.01725698 0.01392078 0.01340365 0.02822709
0.01344252 0.01362491 0.01334286 0.01341009]
mean value: 0.015215444564819335
key: test_mcc
value: [0.26666667 0.36803496 0.70710678 0.67082039 0.70710678 0.56980288
0.55555556 0.33333333 0.56980288 0.67082039]
mean value: 0.5419050634007493
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63157895 0.68421053 0.83333333 0.83333333 0.83333333 0.77777778
0.77777778 0.66666667 0.77777778 0.83333333]
mean value: 0.7649122807017544
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.63157895 0.72727273 0.8 0.84210526 0.85714286 0.75
0.77777778 0.66666667 0.75 0.82352941]
mean value: 0.7626073651151051
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.66666667 1. 0.8 0.75 0.85714286
0.77777778 0.66666667 0.85714286 0.875 ]
mean value: 0.7850396825396825
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.8 0.66666667 0.88888889 1. 0.66666667
0.77777778 0.66666667 0.66666667 0.77777778]
mean value: 0.7577777777777778
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63333333 0.67777778 0.83333333 0.83333333 0.83333333 0.77777778
0.77777778 0.66666667 0.77777778 0.83333333]
mean value: 0.7644444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.46153846 0.57142857 0.66666667 0.72727273 0.75 0.6
0.63636364 0.5 0.6 0.7 ]
mean value: 0.6213270063270063
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01788473 0.01593089 0.01349783 0.01348853 0.01319408 0.01268983
0.01400018 0.01222539 0.01735711 0.01341629]
mean value: 0.014368486404418946
key: score_time
value: [0.01218581 0.00902629 0.00874782 0.00860453 0.00856733 0.00931859
0.00903726 0.0087738 0.01255965 0.00958991]
mean value: 0.009641098976135253
key: test_mcc
value: [0.9 0.80507649 1. 0.67082039 0.56980288 1.
0.67082039 0.33333333 0.89442719 0.79772404]
mean value: 0.7642004714248192
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.89473684 1. 0.83333333 0.77777778 1.
0.83333333 0.66666667 0.94444444 0.88888889]
mean value: 0.8786549707602339
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.90909091 1. 0.84210526 0.8 1.
0.84210526 0.66666667 0.94117647 0.875 ]
mean value: 0.8823512993714232
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.83333333 1. 0.8 0.72727273 1.
0.8 0.66666667 1. 1. ]
mean value: 0.8727272727272728
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 0.88888889 1.
0.88888889 0.66666667 0.88888889 0.77777778]
mean value: 0.9
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95 0.88888889 1. 0.83333333 0.77777778 1.
0.83333333 0.66666667 0.94444444 0.88888889]
mean value: 0.8783333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.83333333 1. 0.72727273 0.66666667 1.
0.72727273 0.5 0.88888889 0.77777778]
mean value: 0.8021212121212121
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.78
Accuracy on Blind test: 0.89
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09235191 0.09098029 0.09063077 0.09804845 0.09962702 0.10106015
0.09401655 0.09447575 0.09316206 0.09071183]
mean value: 0.09450647830963135
key: score_time
value: [0.0171411 0.01724839 0.01732278 0.01876092 0.01842284 0.01923037
0.01827502 0.0200417 0.01728535 0.01696825]
mean value: 0.01806967258453369
key: test_mcc
value: [0.47777778 0.39056329 0.67082039 0.67082039 0.3721042 0.70710678
0.4472136 0.11111111 0.89442719 0.77777778]
mean value: 0.5519722513396789
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.68421053 0.83333333 0.83333333 0.66666667 0.83333333
0.72222222 0.55555556 0.94444444 0.88888889]
mean value: 0.7698830409356725
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.75 0.82352941 0.84210526 0.72727273 0.8
0.70588235 0.55555556 0.94736842 0.88888889]
mean value: 0.7777444725896738
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 0.64285714 0.875 0.8 0.61538462 1.
0.75 0.55555556 0.9 0.88888889]
mean value: 0.7727686202686203
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.9 0.77777778 0.88888889 0.88888889 0.66666667
0.66666667 0.55555556 1. 0.88888889]
mean value: 0.8011111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73888889 0.67222222 0.83333333 0.83333333 0.66666667 0.83333333
0.72222222 0.55555556 0.94444444 0.88888889]
mean value: 0.7688888888888888
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58333333 0.6 0.7 0.72727273 0.57142857 0.66666667
0.54545455 0.38461538 0.9 0.8 ]
mean value: 0.6478771228771228
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00902104 0.00883174 0.00882602 0.00874376 0.00882483 0.00876188
0.00883603 0.00873995 0.00897932 0.00881672]
mean value: 0.008838129043579102
key: score_time
value: [0.00852895 0.00839901 0.0083952 0.00833488 0.0083344 0.0084455
0.00834584 0.00843763 0.00840974 0.00853491]
mean value: 0.00841660499572754
key: test_mcc
value: [0.28752732 0.71611487 0.12403473 0.4472136 0.4472136 0.62017367
0.34188173 0.11396058 0.56980288 0.47140452]
mean value: 0.413932749789502
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63157895 0.84210526 0.55555556 0.72222222 0.72222222 0.77777778
0.66666667 0.55555556 0.77777778 0.72222222]
mean value: 0.6973684210526315
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.86956522 0.42857143 0.73684211 0.73684211 0.71428571
0.625 0.6 0.75 0.66666667]
mean value: 0.6794439904108096
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.58333333 0.76923077 0.6 0.7 0.7 1.
0.71428571 0.54545455 0.85714286 0.83333333]
mean value: 0.7302780552780552
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 1. 0.33333333 0.77777778 0.77777778 0.55555556
0.55555556 0.66666667 0.66666667 0.55555556]
mean value: 0.6666666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63888889 0.83333333 0.55555556 0.72222222 0.72222222 0.77777778
0.66666667 0.55555556 0.77777778 0.72222222]
mean value: 0.6972222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.76923077 0.27272727 0.58333333 0.58333333 0.55555556
0.45454545 0.42857143 0.6 0.5 ]
mean value: 0.5247297147297147
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.65
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.20718908 1.40801001 1.27958941 1.28461361 1.32606244 1.32907915
1.33875179 1.25243473 1.26152778 1.24775457]
mean value: 1.2935012578964233
key: score_time
value: [0.14160895 0.09528899 0.1088841 0.10485816 0.09723663 0.09495139
0.09413624 0.09549427 0.09452558 0.09518194]
mean value: 0.10221662521362304
key: test_mcc
value: [0.89893315 0.58655573 0.79772404 0.67082039 0.47140452 0.79772404
0.56980288 0.33333333 0.89442719 0.77777778]
mean value: 0.6798503044277107
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.78947368 0.88888889 0.83333333 0.72222222 0.88888889
0.77777778 0.66666667 0.94444444 0.88888889]
mean value: 0.8347953216374269
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.81818182 0.875 0.84210526 0.76190476 0.875
0.75 0.66666667 0.94736842 0.88888889]
mean value: 0.8366292290440898
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 1. 0.8 0.66666667 1.
0.85714286 0.66666667 0.9 0.88888889]
mean value: 0.8529365079365079
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778
0.66666667 0.66666667 1. 0.88888889]
mean value: 0.8344444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.78333333 0.88888889 0.83333333 0.72222222 0.88888889
0.77777778 0.66666667 0.94444444 0.88888889]
mean value: 0.8338888888888889
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [0.88888889 0.69230769 0.77777778 0.72727273 0.61538462 0.77777778
0.6 0.5 0.9 0.8 ]
mean value: 0.727940947940948
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.59
Accuracy on Blind test: 0.81
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.86686087 0.89369035 0.86090946 0.92003894 0.87380433 0.89239073
0.92993355 0.87646794 0.89022374 0.97747231]
mean value: 0.8981792211532593
key: score_time
value: [0.15744424 0.19940758 0.16025209 0.1420064 0.14563394 0.15724635
0.2215755 0.11703801 0.16666937 0.12801671]
mean value: 0.1595290184020996
key: test_mcc
value: [0.78888889 0.68543653 0.70710678 0.67082039 0.3721042 0.79772404
0.89442719 0.2236068 0.67082039 0.77777778]
mean value: 0.6588712988925703
key: train_mcc
value: [0.96325856 0.96385008 0.93909422 0.95150257 0.93965346 0.92710507
0.92710507 0.96406004 0.93965346 0.95150257]
mean value: 0.9466785096847014
key: test_accuracy
value: [0.89473684 0.84210526 0.83333333 0.83333333 0.66666667 0.88888889
0.94444444 0.61111111 0.83333333 0.88888889]
mean value: 0.8236842105263158
key: train_accuracy
value: [0.98159509 0.98159509 0.9695122 0.97560976 0.9695122 0.96341463
0.96341463 0.98170732 0.9695122 0.97560976]
mean value: 0.9731482866975909
key: test_fscore
value: [0.88888889 0.85714286 0.8 0.84210526 0.72727273 0.875
0.94117647 0.58823529 0.84210526 0.88888889]
mean value: 0.8250815653215035
key: train_fscore
value: [0.98181818 0.98181818 0.96969697 0.97590361 0.97005988 0.96385542
0.96385542 0.98203593 0.97005988 0.97590361]
mean value: 0.9735007094245244
key: test_precision
value: [0.88888889 0.81818182 1. 0.8 0.61538462 1.
1. 0.625 0.8 0.88888889]
mean value: 0.8436344211344211
key: train_precision
value: [0.97590361 0.96428571 0.96385542 0.96428571 0.95294118 0.95238095
0.95238095 0.96470588 0.95294118 0.96428571]
mean value: 0.9607966319057744
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.77777778
0.88888889 0.55555556 0.88888889 0.88888889]
mean value: 0.8233333333333333
key: train_recall
value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 0.97560976
0.97560976 1. 0.98780488 0.98780488]
mean value: 0.9865853658536585
key: test_roc_auc
value: [0.89444444 0.83888889 0.83333333 0.83333333 0.66666667 0.88888889
0.94444444 0.61111111 0.83333333 0.88888889]
mean value: 0.8233333333333333
key: train_roc_auc
value: [0.98155676 0.98170732 0.9695122 0.97560976 0.9695122 0.96341463
0.96341463 0.98170732 0.9695122 0.97560976]
mean value: 0.9731556760012045
key: test_jcc
value: [0.8 0.75 0.66666667 0.72727273 0.57142857 0.77777778
0.88888889 0.41666667 0.72727273 0.8 ]
mean value: 0.7125974025974026
key: train_jcc
value: [0.96428571 0.96428571 0.94117647 0.95294118 0.94186047 0.93023256
0.93023256 0.96470588 0.94186047 0.95294118]
mean value: 0.9484522180965409
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02150226 0.00884724 0.00999117 0.0089252 0.00994086 0.00981355
0.00895786 0.00987864 0.01000977 0.00989366]
mean value: 0.010776019096374512
key: score_time
value: [0.01239491 0.00845385 0.00954509 0.00856543 0.00893474 0.00922799
0.0086267 0.00930977 0.00923538 0.00921655]
mean value: 0.009351038932800293
key: test_mcc
value: [0.25844328 0.16854997 0.47140452 0.4472136 0.34188173 0.1490712
0.47140452 0.12403473 0.34188173 0.3721042 ]
mean value: 0.3145989478923389
key: train_mcc
value: [0.49487065 0.53197363 0.49507377 0.52223297 0.52223297 0.553295
0.525 0.55060372 0.525 0.49938477]
mean value: 0.5219667468393889
key: test_accuracy
value: [0.63157895 0.57894737 0.72222222 0.72222222 0.66666667 0.55555556
0.72222222 0.55555556 0.66666667 0.66666667]
mean value: 0.6488304093567251
key: train_accuracy
value: [0.74233129 0.75460123 0.74390244 0.75609756 0.75609756 0.76829268
0.75609756 0.76219512 0.75609756 0.73780488]
mean value: 0.7533517881191082
key: test_fscore
value: [0.58823529 0.55555556 0.66666667 0.70588235 0.625 0.33333333
0.66666667 0.42857143 0.625 0.57142857]
mean value: 0.5766339869281046
key: train_fscore
value: [0.71621622 0.71014493 0.72 0.72972973 0.72972973 0.73611111
0.7260274 0.71942446 0.7260274 0.69064748]
mean value: 0.720405845128961
key: test_precision
value: [0.625 0.625 0.83333333 0.75 0.71428571 0.66666667
0.83333333 0.6 0.71428571 0.8 ]
mean value: 0.7161904761904762
key: train_precision
value: [0.8030303 0.85964912 0.79411765 0.81818182 0.81818182 0.85483871
0.828125 0.87719298 0.828125 0.84210526]
mean value: 0.8323547664551235
key: test_recall
value: [0.55555556 0.5 0.55555556 0.66666667 0.55555556 0.22222222
0.55555556 0.33333333 0.55555556 0.44444444]
mean value: 0.49444444444444446
key: train_recall
value: [0.64634146 0.60493827 0.65853659 0.65853659 0.65853659 0.64634146
0.64634146 0.6097561 0.64634146 0.58536585]
mean value: 0.6361035832580548
key: test_roc_auc
value: [0.62777778 0.58333333 0.72222222 0.72222222 0.66666667 0.55555556
0.72222222 0.55555556 0.66666667 0.66666667]
mean value: 0.6488888888888888
key: train_roc_auc
value: [0.74292382 0.75368865 0.74390244 0.75609756 0.75609756 0.76829268
0.75609756 0.76219512 0.75609756 0.73780488]
mean value: 0.753319783197832
key: test_jcc
value: [0.41666667 0.38461538 0.5 0.54545455 0.45454545 0.2
0.5 0.27272727 0.45454545 0.4 ]
mean value: 0.4128554778554778
key: train_jcc
value: [0.55789474 0.5505618 0.5625 0.57446809 0.57446809 0.58241758
0.56989247 0.56179775 0.56989247 0.52747253]
mean value: 0.5631365513743338
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09501481 0.23909044 0.12473583 0.04655695 0.06785107 0.07981181
0.39118195 0.06984305 0.06249595 0.29534435]
mean value: 0.1471926212310791
key: score_time
value: [0.01050901 0.01090145 0.01074934 0.01110911 0.01052189 0.01095223
0.01346135 0.01077437 0.01050735 0.01233649]
mean value: 0.011182260513305665
key: test_mcc
value: [0.89893315 0.89893315 1. 0.77777778 0.56980288 0.89442719
0.79772404 0.4472136 1. 1. ]
mean value: 0.8284811781695286
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 1. 0.88888889 0.77777778 0.94444444
0.88888889 0.72222222 1. 1. ]
mean value: 0.9116959064327486
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.95238095 1. 0.88888889 0.8 0.94117647
0.875 0.73684211 1. 1. ]
mean value: 0.9135464887709469
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.90909091 1. 0.88888889 0.72727273 1.
1. 0.7 1. 1. ]
mean value: 0.9225252525252525
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.88888889 0.88888889 0.88888889
0.77777778 0.77777778 1. 1. ]
mean value: 0.9111111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.94444444 1. 0.88888889 0.77777778 0.94444444
0.88888889 0.72222222 1. 1. ]
mean value: 0.9111111111111111
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.90909091 1. 0.8 0.66666667 0.88888889
0.77777778 0.58333333 1. 1. ]
mean value: 0.8514646464646465
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03162622 0.05078006 0.06450272 0.02454734 0.05303431 0.04205704
0.03058195 0.03836989 0.05589056 0.04631686]
mean value: 0.04377069473266602
key: score_time
value: [0.02164102 0.01792717 0.01255107 0.01289773 0.0353806 0.01196766
0.01205635 0.02032638 0.02122998 0.01304936]
mean value: 0.017902731895446777
key: test_mcc
value: [0.68888889 0.57777778 0.79772404 0.67082039 0.56980288 0.4472136
0.4472136 0.11396058 0.34188173 0.67082039]
mean value: 0.5326103867520664
key: train_mcc
value: [1. 0.98780488 0.98787834 0.97560976 0.98787834 1.
0.98787834 1. 0.98787834 0.98787834]
mean value: 0.9902806333682408
key: test_accuracy
value: [0.84210526 0.78947368 0.88888889 0.83333333 0.77777778 0.72222222
0.72222222 0.55555556 0.66666667 0.83333333]
mean value: 0.7631578947368421
key: train_accuracy
value: [1. 0.99386503 0.99390244 0.98780488 0.99390244 1.
0.99390244 1. 0.99390244 0.99390244]
mean value: 0.9951182103845578
key: test_fscore
value: [0.84210526 0.8 0.9 0.82352941 0.75 0.70588235
0.70588235 0.5 0.625 0.82352941]
mean value: 0.747592879256966
key: train_fscore
value: [1. 0.99386503 0.99393939 0.98780488 0.99386503 1.
0.99393939 1. 0.99393939 0.99393939]
mean value: 0.9951292515156049
key: test_precision
value: [0.8 0.8 0.81818182 0.875 0.85714286 0.75
0.75 0.57142857 0.71428571 0.875 ]
mean value: 0.7811038961038961
key: train_precision
value: [1. 0.98780488 0.98795181 0.98780488 1. 1.
0.98795181 1. 0.98795181 0.98795181]
mean value: 0.9927416985013223
key: test_recall
value: [0.88888889 0.8 1. 0.77777778 0.66666667 0.66666667
0.66666667 0.44444444 0.55555556 0.77777778]
mean value: 0.7244444444444444
key: train_recall
value: [1. 1. 1. 0.98780488 0.98780488 1.
1. 1. 1. 1. ]
mean value: 0.9975609756097561
key: test_roc_auc
value: [0.84444444 0.78888889 0.88888889 0.83333333 0.77777778 0.72222222
0.72222222 0.55555556 0.66666667 0.83333333]
mean value: 0.7633333333333333
key: train_roc_auc
value: [1. 0.99390244 0.99390244 0.98780488 0.99390244 1.
0.99390244 1. 0.99390244 0.99390244]
mean value: 0.9951219512195122
key: test_jcc
value: [0.72727273 0.66666667 0.81818182 0.7 0.6 0.54545455
0.54545455 0.33333333 0.45454545 0.7 ]
mean value: 0.6090909090909091
key: train_jcc
value: [1. 0.98780488 0.98795181 0.97590361 0.98780488 1.
0.98795181 1. 0.98795181 0.98795181]
mean value: 0.9903320599471055
MCC on Blind test: 0.31
Accuracy on Blind test: 0.65
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02618909 0.00915051 0.00890017 0.00884986 0.00900197 0.00903964
0.00874496 0.00874972 0.00885129 0.00870752]
mean value: 0.010618472099304199
key: score_time
value: [0.01357293 0.00898504 0.00841975 0.00844479 0.00916314 0.00864339
0.00847054 0.00853252 0.0083828 0.00896931]
mean value: 0.00915842056274414
key: test_mcc
value: [0.05555556 0.36666667 0.67082039 0.2236068 0.3721042 0.47140452
0.56980288 0.2236068 0.2236068 0.4472136 ]
mean value: 0.36243882110789005
key: train_mcc
value: [0.47384761 0.49713703 0.45152179 0.45125307 0.44020439 0.51219512
0.41475753 0.47735225 0.50033496 0.50003718]
mean value: 0.47186409350584424
key: test_accuracy
value: [0.52631579 0.68421053 0.83333333 0.61111111 0.66666667 0.72222222
0.77777778 0.61111111 0.61111111 0.72222222]
mean value: 0.6766081871345029
key: train_accuracy
value: [0.73619632 0.74846626 0.72560976 0.72560976 0.7195122 0.75609756
0.70731707 0.73780488 0.75 0.75 ]
mean value: 0.7356613796199312
key: test_fscore
value: [0.52631579 0.7 0.84210526 0.63157895 0.72727273 0.66666667
0.75 0.63157895 0.63157895 0.70588235]
mean value: 0.6812979641617413
key: train_fscore
value: [0.74853801 0.74213836 0.73053892 0.72727273 0.72941176 0.75609756
0.71084337 0.74853801 0.75449102 0.75151515]
mean value: 0.7399384906254795
key: test_precision
value: [0.5 0.7 0.8 0.6 0.61538462 0.83333333
0.85714286 0.6 0.6 0.75 ]
mean value: 0.6855860805860806
key: train_precision
value: [0.71910112 0.75641026 0.71764706 0.72289157 0.70454545 0.75609756
0.70238095 0.71910112 0.74117647 0.74698795]
mean value: 0.7286339518987338
key: test_recall
value: [0.55555556 0.7 0.88888889 0.66666667 0.88888889 0.55555556
0.66666667 0.66666667 0.66666667 0.66666667]
mean value: 0.6922222222222222
key: train_recall
value: [0.7804878 0.72839506 0.74390244 0.73170732 0.75609756 0.75609756
0.7195122 0.7804878 0.76829268 0.75609756]
mean value: 0.7521077988557663
key: test_roc_auc
value: [0.52777778 0.68333333 0.83333333 0.61111111 0.66666667 0.72222222
0.77777778 0.61111111 0.61111111 0.72222222]
mean value: 0.6766666666666666
key: train_roc_auc
value: [0.73592291 0.74834387 0.72560976 0.72560976 0.7195122 0.75609756
0.70731707 0.73780488 0.75 0.75 ]
mean value: 0.7356218006624511
key: test_jcc
value: [0.35714286 0.53846154 0.72727273 0.46153846 0.57142857 0.5
0.6 0.46153846 0.46153846 0.54545455]
mean value: 0.5224375624375625
key: train_jcc
value: [0.59813084 0.59 0.5754717 0.57142857 0.57407407 0.60784314
0.55140187 0.59813084 0.60576923 0.60194175]
mean value: 0.5874192010614671
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01138663 0.01504159 0.01759362 0.01546073 0.0162313 0.01648188
0.0172379 0.0149343 0.01540208 0.01473355]
mean value: 0.015450358390808105
key: score_time
value: [0.00881577 0.01157713 0.01190829 0.01156306 0.0117166 0.01163983
0.01195574 0.01151013 0.0118072 0.01173139]
mean value: 0.011422514915466309
key: test_mcc
value: [0.68888889 0.68543653 0.89442719 0.53452248 0.62017367 0.67082039
0.56980288 0.2236068 0.26726124 0.70710678]
mean value: 0.5862046859894403
key: train_mcc
value: [0.86816623 0.93871406 0.91798509 0.72987004 0.89565496 0.75955453
0.95235327 0.92793395 0.65275337 0.73970927]
mean value: 0.8382694768223476
key: test_accuracy
value: [0.84210526 0.84210526 0.94444444 0.72222222 0.77777778 0.83333333
0.77777778 0.61111111 0.61111111 0.83333333]
mean value: 0.7795321637426901
key: train_accuracy
value: [0.93251534 0.96932515 0.95731707 0.84756098 0.94512195 0.86585366
0.97560976 0.96341463 0.79878049 0.85365854]
mean value: 0.9109157563968278
key: test_fscore
value: [0.84210526 0.85714286 0.94736842 0.7826087 0.81818182 0.84210526
0.75 0.63157895 0.46153846 0.85714286]
mean value: 0.778977258439501
key: train_fscore
value: [0.93567251 0.9689441 0.95906433 0.86772487 0.94797688 0.88172043
0.975 0.96428571 0.7480916 0.87234043]
mean value: 0.912082086080032
key: test_precision
value: [0.8 0.81818182 0.9 0.64285714 0.69230769 0.8
0.85714286 0.6 0.75 0.75 ]
mean value: 0.7610489510489511
key: train_precision
value: [0.8988764 0.975 0.92134831 0.76635514 0.9010989 0.78846154
1. 0.94186047 1. 0.77358491]
mean value: 0.8966585669625136
key: test_recall
value: [0.88888889 0.9 1. 1. 1. 0.88888889
0.66666667 0.66666667 0.33333333 1. ]
mean value: 0.8344444444444444
key: train_recall
value: [0.97560976 0.96296296 1. 1. 1. 1.
0.95121951 0.98780488 0.59756098 1. ]
mean value: 0.9475158084914183
key: test_roc_auc
value: [0.84444444 0.83888889 0.94444444 0.72222222 0.77777778 0.83333333
0.77777778 0.61111111 0.61111111 0.83333333]
mean value: 0.7794444444444444
key: train_roc_auc
value: [0.93224932 0.96928636 0.95731707 0.84756098 0.94512195 0.86585366
0.97560976 0.96341463 0.79878049 0.85365854]
mean value: 0.9108852755194219
key: test_jcc
value: [0.72727273 0.75 0.9 0.64285714 0.69230769 0.72727273
0.6 0.46153846 0.3 0.75 ]
mean value: 0.6551248751248752
key: train_jcc
value: [0.87912088 0.93975904 0.92134831 0.76635514 0.9010989 0.78846154
0.95121951 0.93103448 0.59756098 0.77358491]
mean value: 0.844954368584343
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01441765 0.01369786 0.01448107 0.01370692 0.01383114 0.01426673
0.01549292 0.01588058 0.01495886 0.01427841]
mean value: 0.014501214027404785
key: score_time
value: [0.01004958 0.01142454 0.01141286 0.01143146 0.01136494 0.01143551
0.01144528 0.01216745 0.01214123 0.01187587]
mean value: 0.011474871635437011
key: test_mcc
value: [0.78888889 0.50604808 0.67082039 0.62017367 0.79772404 0.53452248
0.56980288 0.23570226 0.24253563 0.70710678]
mean value: 0.5673325099894828
key: train_mcc
value: [0.87043375 0.67895422 0.77964295 0.75955453 0.70891756 0.44393726
1. 0.95150257 0.35112344 0.91798509]
mean value: 0.7462051380346357
key: test_accuracy
value: [0.89473684 0.73684211 0.83333333 0.77777778 0.88888889 0.72222222
0.77777778 0.61111111 0.55555556 0.83333333]
mean value: 0.7631578947368421
key: train_accuracy
value: [0.93251534 0.81595092 0.87804878 0.86585366 0.84146341 0.66463415
1. 0.97560976 0.6097561 0.95731707]
mean value: 0.854114918449798
key: test_fscore
value: [0.88888889 0.70588235 0.84210526 0.81818182 0.875 0.7826087
0.75 0.53333333 0.69230769 0.8 ]
mean value: 0.7688308044462978
key: train_fscore
value: [0.92903226 0.77272727 0.89130435 0.88172043 0.81690141 0.74885845
1. 0.97530864 0.71929825 0.95541401]
mean value: 0.8690565064992889
key: test_precision
value: [0.88888889 0.85714286 0.8 0.69230769 1. 0.64285714
0.85714286 0.66666667 0.52941176 1. ]
mean value: 0.7934417869711987
key: train_precision
value: [0.98630137 1. 0.80392157 0.78846154 0.96666667 0.59854015
1. 0.9875 0.56164384 1. ]
mean value: 0.869303512522051
key: test_recall
value: [0.88888889 0.6 0.88888889 1. 0.77777778 1.
0.66666667 0.44444444 1. 0.66666667]
mean value: 0.7933333333333333
key: train_recall
value: [0.87804878 0.62962963 1. 1. 0.70731707 1.
1. 0.96341463 1. 0.91463415]
mean value: 0.9093044263775971
key: test_roc_auc
value: [0.89444444 0.74444444 0.83333333 0.77777778 0.88888889 0.72222222
0.77777778 0.61111111 0.55555556 0.83333333]
mean value: 0.7638888888888888
key: train_roc_auc
value: [0.93285155 0.81481481 0.87804878 0.86585366 0.84146341 0.66463415
1. 0.97560976 0.6097561 0.95731707]
mean value: 0.8540349292381813
key: test_jcc
value: [0.8 0.54545455 0.72727273 0.69230769 0.77777778 0.64285714
0.6 0.36363636 0.52941176 0.66666667]
mean value: 0.6345384680678798
key: train_jcc
value: [0.86746988 0.62962963 0.80392157 0.78846154 0.69047619 0.59854015
1. 0.95180723 0.56164384 0.91463415]
mean value: 0.7806584163571848
MCC on Blind test: 0.56
Accuracy on Blind test: 0.78
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.12057924 0.11223555 0.11333489 0.11369205 0.11219025 0.11550355
0.11615276 0.11330104 0.1124258 0.11430383]
mean value: 0.1143718957901001
key: score_time
value: [0.01463461 0.01489043 0.01503968 0.01504302 0.01546264 0.01508927
0.01510382 0.01506543 0.01499128 0.01474929]
mean value: 0.015006947517395019
key: test_mcc
value: [1. 0.80507649 1. 0.67082039 0.56980288 0.79772404
0.89442719 0.33333333 0.77777778 0.70710678]
mean value: 0.755606887996258
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.89473684 1. 0.83333333 0.77777778 0.88888889
0.94444444 0.66666667 0.88888889 0.83333333]
mean value: 0.8728070175438596
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.90909091 1. 0.84210526 0.8 0.875
0.94117647 0.66666667 0.88888889 0.8 ]
mean value: 0.8722928198392594
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 1. 0.8 0.72727273 1.
1. 0.66666667 0.88888889 1. ]
mean value: 0.8916161616161616
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 0.88888889 0.77777778
0.88888889 0.66666667 0.88888889 0.66666667]
mean value: 0.8666666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.88888889 1. 0.83333333 0.77777778 0.88888889
0.94444444 0.66666667 0.88888889 0.83333333]
mean value: 0.8722222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.83333333 1. 0.72727273 0.66666667 0.77777778
0.88888889 0.5 0.8 0.66666667]
mean value: 0.786060606060606
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.66
Accuracy on Blind test: 0.84
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0368824 0.03443956 0.04048729 0.0401268 0.05609751 0.04586601
0.0384922 0.03691435 0.03752112 0.04061699]
mean value: 0.04074442386627197
key: score_time
value: [0.0196774 0.02022862 0.03363061 0.03083062 0.02206516 0.02526069
0.0358851 0.02022338 0.02485967 0.02349877]
mean value: 0.025616002082824708
key: test_mcc
value: [0.80507649 1. 0.89442719 0.79772404 0.70710678 0.77777778
0.67082039 0.2236068 0.79772404 1. ]
mean value: 0.7674263497298501
key: train_mcc
value: [1. 1. 0.97590007 1. 0.98787834 1.
0.96406004 1. 0.98787834 0.98787834]
mean value: 0.990359513473492
key: test_accuracy
value: [0.89473684 1. 0.94444444 0.88888889 0.83333333 0.88888889
0.83333333 0.61111111 0.88888889 1. ]
mean value: 0.8783625730994152
key: train_accuracy
value: [1. 1. 0.98780488 1. 0.99390244 1.
0.98170732 1. 0.99390244 0.99390244]
mean value: 0.9951219512195122
key: test_fscore
value: [0.875 1. 0.94117647 0.875 0.85714286 0.88888889
0.82352941 0.58823529 0.875 1. ]
mean value: 0.8723972922502334
key: train_fscore
value: [1. 1. 0.98765432 1. 0.99386503 1.
0.98136646 1. 0.99393939 0.99393939]
mean value: 0.9950764599168618
key: test_precision
value: [1. 1. 1. 1. 0.75 0.88888889
0.875 0.625 1. 1. ]
mean value: 0.9138888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.98795181 0.98795181]
mean value: 0.9975903614457832
key: test_recall
value: [0.77777778 1. 0.88888889 0.77777778 1. 0.88888889
0.77777778 0.55555556 0.77777778 1. ]
mean value: 0.8444444444444444
key: train_recall
value: [1. 1. 0.97560976 1. 0.98780488 1.
0.96341463 1. 1. 1. ]
mean value: 0.9926829268292683
key: test_roc_auc
value: [0.88888889 1. 0.94444444 0.88888889 0.83333333 0.88888889
0.83333333 0.61111111 0.88888889 1. ]
mean value: 0.8777777777777778
key: train_roc_auc
value: [1. 1. 0.98780488 1. 0.99390244 1.
0.98170732 1. 0.99390244 0.99390244]
mean value: 0.9951219512195122
key: test_jcc
value: [0.77777778 1. 0.88888889 0.77777778 0.75 0.8
0.7 0.41666667 0.77777778 1. ]
mean value: 0.7888888888888889
key: train_jcc
value: [1. 1. 0.97560976 1. 0.98780488 1.
0.96341463 1. 0.98795181 0.98795181]
mean value: 0.9902732882750515
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.02155781 0.02346563 0.02286315 0.04235959 0.02257943 0.02250385
0.02461076 0.05078363 0.05083275 0.02259588]
mean value: 0.03041524887084961
key: score_time
value: [0.01257849 0.01255059 0.01254272 0.02155042 0.01245618 0.01247454
0.01241016 0.02084184 0.02274609 0.01258111]
mean value: 0.015273213386535645
key: test_mcc
value: [0.26666667 0.16854997 0.70710678 0.55555556 0.67082039 0.4472136
0.34188173 0. 0.77777778 0.56980288]
mean value: 0.4505375347229357
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63157895 0.57894737 0.83333333 0.77777778 0.83333333 0.66666667
0.66666667 0.5 0.88888889 0.77777778]
mean value: 0.7154970760233919
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.63157895 0.55555556 0.8 0.77777778 0.84210526 0.5
0.625 0.4 0.88888889 0.75 ]
mean value: 0.6770906432748538
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.625 1. 0.77777778 0.8 1.
0.71428571 0.5 0.88888889 0.85714286]
mean value: 0.7763095238095238
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.5 0.66666667 0.77777778 0.88888889 0.33333333
0.55555556 0.33333333 0.88888889 0.66666667]
mean value: 0.6277777777777778
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63333333 0.58333333 0.83333333 0.77777778 0.83333333 0.66666667
0.66666667 0.5 0.88888889 0.77777778]
mean value: 0.7161111111111111
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.46153846 0.38461538 0.66666667 0.63636364 0.72727273 0.33333333
0.45454545 0.25 0.8 0.6 ]
mean value: 0.5314335664335664
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.18
Accuracy on Blind test: 0.59
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.37662029 0.35531259 0.3597064 0.36402702 0.3490901 0.35136151
0.35017991 0.34702373 0.34805918 0.34904552]
mean value: 0.35504262447357177
key: score_time
value: [0.00975657 0.00898051 0.00918102 0.0090487 0.00944591 0.00902224
0.00896931 0.00889969 0.00917578 0.00898147]
mean value: 0.0091461181640625
key: test_mcc
value: [0.89893315 0.71611487 0.79772404 0.77777778 0.79772404 0.89442719
0.77777778 0.4472136 0.89442719 1. ]
mean value: 0.8002119627480698
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.84210526 0.88888889 0.88888889 0.88888889 0.94444444
0.88888889 0.72222222 0.94444444 1. ]
mean value: 0.8956140350877193
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.86956522 0.875 0.88888889 0.9 0.94117647
0.88888889 0.73684211 0.94736842 1. ]
mean value: 0.8988906462661342
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.76923077 1. 0.88888889 0.81818182 1.
0.88888889 0.7 0.9 1. ]
mean value: 0.8965190365190365
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.77777778 0.88888889 1. 0.88888889
0.88888889 0.77777778 1. 1. ]
mean value: 0.9111111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.83333333 0.88888889 0.88888889 0.88888889 0.94444444
0.88888889 0.72222222 0.94444444 1. ]
mean value: 0.8944444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.76923077 0.77777778 0.8 0.81818182 0.88888889
0.8 0.58333333 0.9 1. ]
mean value: 0.8226301476301476
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01750708 0.01992321 0.01931262 0.0193913 0.01945806 0.0195179
0.03629398 0.0204246 0.02689838 0.03091669]
mean value: 0.02296438217163086
key: score_time
value: [0.01183605 0.01174116 0.01174712 0.01317406 0.0134604 0.01328158
0.01193166 0.01465511 0.01805115 0.01541853]
mean value: 0.013529682159423828
key: test_mcc
value: [0.62994079 0.41773368 0.33333333 0.4472136 0.79772404 0.56980288
0.33333333 0.47140452 0.47140452 0.34188173]
mean value: 0.48137724155149714
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.68421053 0.66666667 0.72222222 0.88888889 0.77777778
0.66666667 0.72222222 0.72222222 0.66666667]
mean value: 0.7307017543859649
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71428571 0.625 0.66666667 0.70588235 0.9 0.75
0.66666667 0.66666667 0.66666667 0.625 ]
mean value: 0.6986834733893558
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 0.66666667 0.75 0.81818182 0.85714286
0.66666667 0.83333333 0.83333333 0.71428571]
mean value: 0.7972943722943723
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.55555556 0.5 0.66666667 0.66666667 1. 0.66666667
0.66666667 0.55555556 0.55555556 0.55555556]
mean value: 0.6388888888888888
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77777778 0.69444444 0.66666667 0.72222222 0.88888889 0.77777778
0.66666667 0.72222222 0.72222222 0.66666667]
mean value: 0.7305555555555555
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55555556 0.45454545 0.5 0.54545455 0.81818182 0.6
0.5 0.5 0.5 0.45454545]
mean value: 0.5428282828282829
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.62
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02956533 0.03639174 0.03307033 0.03308129 0.03312731 0.03301644
0.03308487 0.03315282 0.03304362 0.0330658 ]
mean value: 0.03305995464324951
key: score_time
value: [0.02278686 0.01986313 0.02063823 0.02110219 0.02177215 0.02229452
0.02278328 0.02294707 0.01153684 0.02025557]
mean value: 0.02059798240661621
key: test_mcc
value: [0.78888889 0.68543653 0.89442719 0.67082039 0.77777778 0.56980288
0.67082039 0.2236068 0.67082039 0.89442719]
mean value: 0.6846828435302107
key: train_mcc
value: [0.9509184 0.96326408 0.92682927 0.95121951 0.92682927 0.95121951
0.95121951 0.96348628 0.92682927 0.96348628]
mean value: 0.9475301380884953
key: test_accuracy
value: [0.89473684 0.84210526 0.94444444 0.83333333 0.88888889 0.77777778
0.83333333 0.61111111 0.83333333 0.94444444]
mean value: 0.8403508771929824
key: train_accuracy
value: [0.97546012 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98170732 0.96341463 0.98170732]
mean value: 0.9737543019601975
key: test_fscore
value: [0.88888889 0.85714286 0.94117647 0.84210526 0.88888889 0.75
0.82352941 0.63157895 0.82352941 0.94736842]
mean value: 0.8394208560617229
key: train_fscore
value: [0.97560976 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98181818 0.96341463 0.98159509]
mean value: 0.973769129269653
key: test_precision
value: [0.88888889 0.81818182 1. 0.8 0.88888889 0.85714286
0.875 0.6 0.875 0.9 ]
mean value: 0.8503102453102453
key: train_precision
value: [0.97560976 0.97560976 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.97590361 0.96341463 0.98765432]
mean value: 0.9731850618372315
key: test_recall
value: [0.88888889 0.9 0.88888889 0.88888889 0.88888889 0.66666667
0.77777778 0.66666667 0.77777778 1. ]
mean value: 0.8344444444444444
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97560976 0.98765432 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98780488 0.96341463 0.97560976]
mean value: 0.9743751881963264
key: test_roc_auc
value: [0.89444444 0.83888889 0.94444444 0.83333333 0.88888889 0.77777778
0.83333333 0.61111111 0.83333333 0.94444444]
mean value: 0.84
key: train_roc_auc
value: [0.9754592 0.98163204 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98170732 0.96341463 0.98170732]
mean value: 0.9737579042457091
key: test_jcc
value: [0.8 0.75 0.88888889 0.72727273 0.8 0.6
0.7 0.46153846 0.7 0.9 ]
mean value: 0.7327700077700078
key: train_jcc
value: [0.95238095 0.96385542 0.92941176 0.95238095 0.92941176 0.95238095
0.95238095 0.96428571 0.92941176 0.96385542]
mean value: 0.9489755661300665
MCC on Blind test: 0.71
Accuracy on Blind test: 0.86
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.24983573 0.25471854 0.21633697 0.21292329 0.22023106 0.21423078
0.21444273 0.33044195 0.28945279 0.34522271]
mean value: 0.2547836542129517
key: score_time
value: [0.03669548 0.01727962 0.02205777 0.02206111 0.02186513 0.02367711
0.02402163 0.02303267 0.02362227 0.0215323 ]
mean value: 0.023584508895874025
key: test_mcc
value: [0.78888889 0.68543653 0.89442719 0.67082039 0.77777778 0.56980288
0.67082039 0.2236068 0.67082039 0.55555556]
mean value: 0.6507956799857747
key: train_mcc
value: [0.9509184 0.96326408 0.92682927 0.95121951 0.92682927 0.95121951
0.95121951 0.96348628 0.92682927 0.97560976]
mean value: 0.9487424854850787
key: test_accuracy
value: [0.89473684 0.84210526 0.94444444 0.83333333 0.88888889 0.77777778
0.83333333 0.61111111 0.83333333 0.77777778]
mean value: 0.8236842105263158
key: train_accuracy
value: [0.97546012 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98170732 0.96341463 0.98780488]
mean value: 0.9743640580577585
key: test_fscore
value: [0.88888889 0.85714286 0.94117647 0.84210526 0.88888889 0.75
0.82352941 0.63157895 0.82352941 0.77777778]
mean value: 0.8224617917342376
key: train_fscore
value: [0.97560976 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98181818 0.96341463 0.98780488]
mean value: 0.974390107872077
key: test_precision
value: [0.88888889 0.81818182 1. 0.8 0.88888889 0.85714286
0.875 0.6 0.875 0.77777778]
mean value: 0.8380880230880231
key: train_precision
value: [0.97560976 0.97560976 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.97590361 0.96341463 0.98780488]
mean value: 0.9732001175433441
key: test_recall
value: [0.88888889 0.9 0.88888889 0.88888889 0.88888889 0.66666667
0.77777778 0.66666667 0.77777778 0.77777778]
mean value: 0.8122222222222222
key: train_recall
value: [0.97560976 0.98765432 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98780488 0.96341463 0.98780488]
mean value: 0.9755947003914484
key: test_roc_auc
value: [0.89444444 0.83888889 0.94444444 0.83333333 0.88888889 0.77777778
0.83333333 0.61111111 0.83333333 0.77777778]
mean value: 0.8233333333333333
key: train_roc_auc
value: [0.9754592 0.98163204 0.96341463 0.97560976 0.96341463 0.97560976
0.97560976 0.98170732 0.96341463 0.98780488]
mean value: 0.97436766034327
key: test_jcc
value: [0.8 0.75 0.88888889 0.72727273 0.8 0.6
0.7 0.46153846 0.7 0.63636364]
mean value: 0.7064063714063714
key: train_jcc
value: [0.95238095 0.96385542 0.92941176 0.95238095 0.92941176 0.95238095
0.95238095 0.96428571 0.92941176 0.97590361]
mean value: 0.9501803854071749
MCC on Blind test: 0.71
Accuracy on Blind test: 0.86
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02767658 0.05742502 0.03703356 0.02705717 0.07875538 0.02220988
0.03057122 0.06068015 0.07214427 0.02592254]
mean value: 0.043947577476501465
key: score_time
value: [0.01175785 0.02437687 0.01173878 0.01178861 0.01374912 0.01184273
0.01169515 0.01199579 0.01190901 0.01173973]
mean value: 0.013259363174438477
key: test_mcc
value: [0.68888889 0.48934516 0.70710678 0.67082039 0.70710678 0.62017367
0.4472136 0.47140452 0.67082039 0.79772404]
mean value: 0.6270604226143301
key: train_mcc
value: [0.8039452 0.84056007 0.84202713 0.86643371 0.80487805 0.85391256
0.85467601 0.86643371 0.83025669 0.85391256]
mean value: 0.8417035691773689
key: test_accuracy
value: [0.84210526 0.73684211 0.83333333 0.83333333 0.83333333 0.77777778
0.72222222 0.72222222 0.83333333 0.88888889]
mean value: 0.8023391812865497
key: train_accuracy
value: [0.90184049 0.9202454 0.92073171 0.93292683 0.90243902 0.92682927
0.92682927 0.93292683 0.91463415 0.92682927]
mean value: 0.9206232231033967
key: test_fscore
value: [0.84210526 0.7826087 0.8 0.82352941 0.85714286 0.71428571
0.70588235 0.66666667 0.82352941 0.875 ]
mean value: 0.7890750373375895
key: train_fscore
value: [0.90123457 0.9202454 0.91925466 0.93167702 0.90243902 0.92592593
0.925 0.93167702 0.9125 0.92592593]
mean value: 0.9195879538568511
key: test_precision
value: [0.8 0.69230769 1. 0.875 0.75 1.
0.75 0.83333333 0.875 1. ]
mean value: 0.8575641025641025
key: train_precision
value: [0.9125 0.91463415 0.93670886 0.94936709 0.90243902 0.9375
0.94871795 0.94936709 0.93589744 0.9375 ]
mean value: 0.9324631593321775
key: test_recall
value: [0.88888889 0.9 0.66666667 0.77777778 1. 0.55555556
0.66666667 0.55555556 0.77777778 0.77777778]
mean value: 0.7566666666666667
key: train_recall
value: [0.8902439 0.92592593 0.90243902 0.91463415 0.90243902 0.91463415
0.90243902 0.91463415 0.8902439 0.91463415]
mean value: 0.907226738934056
key: test_roc_auc
value: [0.84444444 0.72777778 0.83333333 0.83333333 0.83333333 0.77777778
0.72222222 0.72222222 0.83333333 0.88888889]
mean value: 0.8016666666666666
key: train_roc_auc
value: [0.90191207 0.92028004 0.92073171 0.93292683 0.90243902 0.92682927
0.92682927 0.93292683 0.91463415 0.92682927]
mean value: 0.9206338452273412
key: test_jcc
value: [0.72727273 0.64285714 0.66666667 0.7 0.75 0.55555556
0.54545455 0.5 0.7 0.77777778]
mean value: 0.6565584415584416
key: train_jcc
value: [0.82022472 0.85227273 0.85057471 0.87209302 0.82222222 0.86206897
0.86046512 0.87209302 0.83908046 0.86206897]
mean value: 0.8513163934835046
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.96496487 0.98876548 0.86448479 1.19553041 0.70477009 0.910182
0.83962703 0.85949993 0.71075249 1.0370965 ]
mean value: 0.9075673580169678
key: score_time
value: [0.01353359 0.01326942 0.01348686 0.01345611 0.01320839 0.01332855
0.01315117 0.01320934 0.01310396 0.01312208]
mean value: 0.013286948204040527
key: test_mcc
value: [0.78888889 0.78888889 0.70710678 0.89442719 1. 0.70710678
0.4472136 0.62017367 0.56980288 0.79772404]
mean value: 0.7321332717112444
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
0.92710507 1. 1. 1. ]
mean value: 0.9927105069301106
key: test_accuracy
value: [0.89473684 0.89473684 0.83333333 0.94444444 1. 0.83333333
0.72222222 0.77777778 0.77777778 0.88888889]
mean value: 0.8567251461988304
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
0.96341463 1. 1. 1. ]
mean value: 0.9963414634146341
key: test_fscore
value: [0.88888889 0.9 0.8 0.94117647 1. 0.8
0.70588235 0.71428571 0.75 0.875 ]
mean value: 0.8375233426704015
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
0.96296296 1. 1. 1. ]
mean value: 0.9962962962962962
key: test_precision
value: [0.88888889 0.9 1. 1. 1. 1.
0.75 1. 0.85714286 1. ]
mean value: 0.9396031746031746
key: train_precision
value: [1. 1. 1. 1. 1. 1. 0.975 1. 1. 1. ]
mean value: 0.9975
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.66666667
0.66666667 0.55555556 0.66666667 0.77777778]
mean value: 0.7677777777777778
key: train_recall
value: [1. 1. 1. 1. 1. 1.
0.95121951 1. 1. 1. ]
mean value: 0.9951219512195122
key: test_roc_auc
value: [0.89444444 0.89444444 0.83333333 0.94444444 1. 0.83333333
0.72222222 0.77777778 0.77777778 0.88888889]
mean value: 0.8566666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
0.96341463 1. 1. 1. ]
mean value: 0.9963414634146341
key: test_jcc
value: [0.8 0.81818182 0.66666667 0.88888889 1. 0.66666667
0.54545455 0.55555556 0.6 0.77777778]
mean value: 0.7319191919191919
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
0.92857143 1. 1. 1. ]
mean value: 0.9928571428571429
MCC on Blind test: 0.59
Accuracy on Blind test: 0.81
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01298213 0.0095911 0.00900984 0.00858641 0.00865221 0.0084579
0.00859547 0.00881672 0.00866461 0.00866199]
mean value: 0.009201836585998536
key: score_time
value: [0.01462412 0.00896907 0.0085001 0.00847125 0.00839853 0.00842166
0.00838089 0.00849271 0.00842547 0.0084095 ]
mean value: 0.00910933017730713
key: test_mcc
value: [ 0.19096397 -0.2236068 0.26726124 0.53452248 0.26726124 0.4472136
0.23570226 -0.12403473 0.23570226 0.35355339]
mean value: 0.21845389086052888
key: train_mcc
value: [0.37955068 0.49121874 0.35651205 0.44106783 0.46159309 0.4083697
0.45222959 0.44501237 0.3962947 0.43158776]
mean value: 0.4263436491486879
key: test_accuracy
value: [0.57894737 0.47368421 0.61111111 0.72222222 0.61111111 0.66666667
0.61111111 0.44444444 0.61111111 0.61111111]
mean value: 0.5941520467836258
key: train_accuracy
value: [0.66871166 0.69325153 0.63414634 0.67682927 0.68292683 0.67682927
0.68902439 0.70121951 0.67682927 0.67682927]
mean value: 0.6776597336525513
key: test_fscore
value: [0.63636364 0.64285714 0.69565217 0.7826087 0.69565217 0.75
0.66666667 0.54545455 0.66666667 0.72 ]
mean value: 0.6801921701486919
key: train_fscore
value: [0.73267327 0.76415094 0.72477064 0.75117371 0.75700935 0.74146341
0.75598086 0.75376884 0.73631841 0.74881517]
mean value: 0.7466124601575621
key: test_precision
value: [0.53846154 0.5 0.57142857 0.64285714 0.57142857 0.6
0.58333333 0.46153846 0.58333333 0.5625 ]
mean value: 0.5614880952380953
key: train_precision
value: [0.61666667 0.61832061 0.58088235 0.61068702 0.61363636 0.61788618
0.62204724 0.64102564 0.62184874 0.6124031 ]
mean value: 0.6155403921084903
key: test_recall
value: [0.77777778 0.9 0.88888889 1. 0.88888889 1.
0.77777778 0.66666667 0.77777778 1. ]
mean value: 0.8677777777777778
key: train_recall
value: [0.90243902 1. 0.96341463 0.97560976 0.98780488 0.92682927
0.96341463 0.91463415 0.90243902 0.96341463]
mean value: 0.95
key: test_roc_auc
value: [0.58888889 0.45 0.61111111 0.72222222 0.61111111 0.66666667
0.61111111 0.44444444 0.61111111 0.61111111]
mean value: 0.5927777777777778
key: train_roc_auc
value: [0.66726889 0.69512195 0.63414634 0.67682927 0.68292683 0.67682927
0.68902439 0.70121951 0.67682927 0.67682927]
mean value: 0.6777024992472147
key: test_jcc
value: [0.46666667 0.47368421 0.53333333 0.64285714 0.53333333 0.6
0.5 0.375 0.5 0.5625 ]
mean value: 0.5187374686716792
key: train_jcc
value: [0.578125 0.61832061 0.56834532 0.60150376 0.60902256 0.58914729
0.60769231 0.60483871 0.58267717 0.59848485]
mean value: 0.5958157568248116
MCC on Blind test: 0.27
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00887132 0.0087862 0.00888085 0.00876236 0.00883341 0.00878835
0.00874257 0.00890851 0.00874734 0.00910783]
mean value: 0.008842873573303222
key: score_time
value: [0.00842404 0.00844836 0.00844502 0.00852418 0.00842857 0.00843644
0.00836802 0.00834298 0.00847101 0.00843644]
mean value: 0.008432507514953613
key: test_mcc
value: [ 0.26666667 0.26257545 0.56980288 0.2236068 0.11396058 -0.11111111
0.34188173 0.23570226 0.2236068 0.34188173]
mean value: 0.24685737827811435
key: train_mcc
value: [0.44782413 0.43577775 0.43902439 0.47649639 0.48911599 0.46396698
0.50003718 0.42762497 0.45125307 0.36683699]
mean value: 0.44979578382501245
key: test_accuracy
value: [0.63157895 0.63157895 0.77777778 0.61111111 0.55555556 0.44444444
0.66666667 0.61111111 0.61111111 0.66666667]
mean value: 0.6207602339181286
key: train_accuracy
value: [0.72392638 0.71779141 0.7195122 0.73780488 0.74390244 0.73170732
0.75 0.71341463 0.72560976 0.68292683]
mean value: 0.7246595840191531
key: test_fscore
value: [0.63157895 0.69565217 0.75 0.58823529 0.5 0.44444444
0.625 0.53333333 0.63157895 0.7 ]
mean value: 0.6099823140545311
key: train_fscore
value: [0.72727273 0.7195122 0.7195122 0.72955975 0.75294118 0.73809524
0.74846626 0.70440252 0.72392638 0.67088608]
mean value: 0.7234574510219577
key: test_precision
value: [0.6 0.61538462 0.85714286 0.625 0.57142857 0.44444444
0.71428571 0.66666667 0.6 0.63636364]
mean value: 0.6330716505716506
key: train_precision
value: [0.72289157 0.71084337 0.7195122 0.75324675 0.72727273 0.72093023
0.75308642 0.72727273 0.72839506 0.69736842]
mean value: 0.7260819477765448
key: test_recall
value: [0.66666667 0.8 0.66666667 0.55555556 0.44444444 0.44444444
0.55555556 0.44444444 0.66666667 0.77777778]
mean value: 0.6022222222222222
key: train_recall
value: [0.73170732 0.72839506 0.7195122 0.70731707 0.7804878 0.75609756
0.74390244 0.68292683 0.7195122 0.64634146]
mean value: 0.7216199939777176
key: test_roc_auc
value: [0.63333333 0.62222222 0.77777778 0.61111111 0.55555556 0.44444444
0.66666667 0.61111111 0.61111111 0.66666667]
mean value: 0.62
key: train_roc_auc
value: [0.72387835 0.71785607 0.7195122 0.73780488 0.74390244 0.73170732
0.75 0.71341463 0.72560976 0.68292683]
mean value: 0.7246612466124661
key: test_jcc
value: [0.46153846 0.53333333 0.6 0.41666667 0.33333333 0.28571429
0.45454545 0.36363636 0.46153846 0.53846154]
mean value: 0.44487678987678986
key: train_jcc
value: [0.57142857 0.56190476 0.56190476 0.57425743 0.60377358 0.58490566
0.59803922 0.54368932 0.56730769 0.5047619 ]
mean value: 0.5671972899407909
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00865221 0.0083437 0.00835395 0.00934315 0.00946736 0.00935578
0.00944567 0.00965834 0.00930619 0.00942564]
mean value: 0.009135198593139649
key: score_time
value: [0.01036048 0.00950432 0.00968695 0.01048803 0.01113009 0.01024699
0.01019239 0.01024103 0.01018548 0.01024628]
mean value: 0.010228204727172851
key: test_mcc
value: [ 0.25844328 0.28752732 -0.12403473 0.23570226 0.2236068 0.
0.11396058 0.34188173 0.11396058 0.2236068 ]
mean value: 0.1674654600608012
key: train_mcc
value: [0.42370843 0.42387312 0.44556639 0.43229648 0.39211447 0.47649639
0.47032008 0.46563593 0.47249649 0.4539621 ]
mean value: 0.44564698943223807
key: test_accuracy
value: [0.63157895 0.63157895 0.44444444 0.61111111 0.61111111 0.5
0.55555556 0.66666667 0.55555556 0.61111111]
mean value: 0.5818713450292398
key: train_accuracy
value: [0.71165644 0.71165644 0.7195122 0.71341463 0.69512195 0.73780488
0.73170732 0.73170732 0.73170732 0.72560976]
mean value: 0.7209898249289242
key: test_fscore
value: [0.58823529 0.58823529 0.28571429 0.53333333 0.58823529 0.30769231
0.5 0.625 0.5 0.63157895]
mean value: 0.5148024756461289
key: train_fscore
value: [0.70807453 0.70063694 0.69333333 0.68874172 0.67948718 0.72955975
0.70666667 0.71794872 0.7027027 0.70967742]
mean value: 0.7036828966612066
key: test_precision
value: [0.625 0.71428571 0.4 0.66666667 0.625 0.5
0.57142857 0.71428571 0.57142857 0.6 ]
mean value: 0.5988095238095238
key: train_precision
value: [0.72151899 0.72368421 0.76470588 0.75362319 0.71621622 0.75324675
0.77941176 0.75675676 0.78787879 0.75342466]
mean value: 0.751046720496547
key: test_recall
value: [0.55555556 0.5 0.22222222 0.44444444 0.55555556 0.22222222
0.44444444 0.55555556 0.44444444 0.66666667]
mean value: 0.4611111111111111
key: train_recall
value: [0.69512195 0.67901235 0.63414634 0.63414634 0.64634146 0.70731707
0.64634146 0.68292683 0.63414634 0.67073171]
mean value: 0.6630231857874135
key: test_roc_auc
value: [0.62777778 0.63888889 0.44444444 0.61111111 0.61111111 0.5
0.55555556 0.66666667 0.55555556 0.61111111]
mean value: 0.5822222222222223
key: train_roc_auc
value: [0.71175851 0.71145739 0.7195122 0.71341463 0.69512195 0.73780488
0.73170732 0.73170732 0.73170732 0.72560976]
mean value: 0.7209801264679314
key: test_jcc
value: [0.41666667 0.41666667 0.16666667 0.36363636 0.41666667 0.18181818
0.33333333 0.45454545 0.33333333 0.46153846]
mean value: 0.3544871794871795
key: train_jcc
value: [0.54807692 0.53921569 0.53061224 0.52525253 0.51456311 0.57425743
0.54639175 0.56 0.54166667 0.55 ]
mean value: 0.5430036331284595
MCC on Blind test: -0.08
Accuracy on Blind test: 0.49
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01206779 0.01198173 0.01209545 0.01191854 0.01200819 0.0120225
0.01206994 0.01187992 0.0115304 0.01188707]
mean value: 0.011946153640747071
key: score_time
value: [0.00982642 0.00977206 0.00988817 0.00984406 0.00975108 0.0098269
0.00982642 0.00989437 0.00984883 0.00886917]
mean value: 0.009734749794006348
key: test_mcc
value: [0.15555556 0.26257545 0.70710678 0.77777778 0.26726124 0.26726124
0.55555556 0.23570226 0.55555556 0.56980288]
mean value: 0.43541543059640053
key: train_mcc
value: [0.71781359 0.79198683 0.74395776 0.78141806 0.74440079 0.7804878
0.75699875 0.74395776 0.74528923 0.79321396]
mean value: 0.7599524542438163
key: test_accuracy
value: [0.57894737 0.63157895 0.83333333 0.88888889 0.61111111 0.61111111
0.77777778 0.61111111 0.77777778 0.77777778]
mean value: 0.7099415204678363
key: train_accuracy
value: [0.85889571 0.89570552 0.87195122 0.8902439 0.87195122 0.8902439
0.87804878 0.87195122 0.87195122 0.89634146]
mean value: 0.8797284153823134
key: test_fscore
value: [0.55555556 0.69565217 0.8 0.88888889 0.69565217 0.46153846
0.77777778 0.53333333 0.77777778 0.75 ]
mean value: 0.6936176142697882
key: train_fscore
value: [0.86060606 0.8969697 0.87272727 0.8875 0.8742515 0.8902439
0.875 0.87272727 0.86792453 0.89820359]
mean value: 0.8796153823591574
key: test_precision
value: [0.55555556 0.61538462 1. 0.88888889 0.57142857 0.75
0.77777778 0.66666667 0.77777778 0.85714286]
mean value: 0.7460622710622711
key: train_precision
value: [0.85542169 0.88095238 0.86746988 0.91025641 0.85882353 0.8902439
0.8974359 0.86746988 0.8961039 0.88235294]
mean value: 0.8806530403558976
key: test_recall
value: [0.55555556 0.8 0.66666667 0.88888889 0.88888889 0.33333333
0.77777778 0.44444444 0.77777778 0.66666667]
mean value: 0.6799999999999999
key: train_recall
value: [0.86585366 0.91358025 0.87804878 0.86585366 0.8902439 0.8902439
0.85365854 0.87804878 0.84146341 0.91463415]
mean value: 0.8791629027401385
key: test_roc_auc
value: [0.57777778 0.62222222 0.83333333 0.88888889 0.61111111 0.61111111
0.77777778 0.61111111 0.77777778 0.77777778]
mean value: 0.7088888888888889
key: train_roc_auc
value: [0.85885276 0.89581451 0.87195122 0.8902439 0.87195122 0.8902439
0.87804878 0.87195122 0.87195122 0.89634146]
mean value: 0.8797350195724178
key: test_jcc
value: [0.38461538 0.53333333 0.66666667 0.8 0.53333333 0.3
0.63636364 0.36363636 0.63636364 0.6 ]
mean value: 0.5454312354312354
key: train_jcc
value: [0.75531915 0.81318681 0.77419355 0.79775281 0.77659574 0.8021978
0.77777778 0.77419355 0.76666667 0.81521739]
mean value: 0.7853101250513387
MCC on Blind test: 0.05
Accuracy on Blind test: 0.57
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.6738658 0.89805007 0.65028882 0.70737767 0.84737277 0.68717527
0.66062117 0.86410022 0.6574862 0.7034452 ]
mean value: 0.7349783182144165
key: score_time
value: [0.01342988 0.0133884 0.01378322 0.01348329 0.01350474 0.01336622
0.01323557 0.01228476 0.01329565 0.01332974]
mean value: 0.01331014633178711
key: test_mcc
value: [0.47777778 0.4719399 0.70710678 0.89442719 0.89442719 0.47140452
0.67082039 0.56980288 0.3721042 0.70710678]
mean value: 0.6236917625981757
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.73684211 0.83333333 0.94444444 0.94444444 0.72222222
0.83333333 0.77777778 0.66666667 0.83333333]
mean value: 0.8029239766081872
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.76190476 0.8 0.94117647 0.94736842 0.66666667
0.82352941 0.75 0.57142857 0.8 ]
mean value: 0.7798916408668731
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 0.72727273 1. 1. 0.9 0.83333333
0.875 0.85714286 0.8 1. ]
mean value: 0.8692748917748918
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.8 0.66666667 0.88888889 1. 0.55555556
0.77777778 0.66666667 0.44444444 0.66666667]
mean value: 0.7244444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73888889 0.73333333 0.83333333 0.94444444 0.94444444 0.72222222
0.83333333 0.77777778 0.66666667 0.83333333]
mean value: 0.8027777777777777
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58333333 0.61538462 0.66666667 0.88888889 0.9 0.5
0.7 0.6 0.4 0.66666667]
mean value: 0.652094017094017
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01789188 0.01565242 0.01368737 0.01377201 0.013309 0.01342058
0.01321268 0.01274252 0.01264119 0.01333928]
mean value: 0.013966894149780274
key: score_time
value: [0.01292944 0.01085591 0.00963998 0.0095849 0.00976825 0.00909972
0.00912642 0.00914407 0.00917363 0.00912905]
mean value: 0.009845137596130371
key: test_mcc
value: [0.89893315 0.80507649 0.89442719 0.89442719 0.89442719 0.89442719
0.89442719 0.67082039 0.67082039 1. ]
mean value: 0.8517786377349856
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.89473684 0.94444444 0.94444444 0.94444444 0.94444444
0.94444444 0.83333333 0.83333333 1. ]
mean value: 0.9230994152046783
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.90909091 0.94117647 0.94117647 0.94736842 0.94117647
0.94117647 0.82352941 0.82352941 1. ]
mean value: 0.9209400506614129
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 1. 1. 0.9 1.
1. 0.875 0.875 1. ]
mean value: 0.9483333333333334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.88888889 0.88888889 1. 0.88888889
0.88888889 0.77777778 0.77777778 1. ]
mean value: 0.9
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.88888889 0.94444444 0.94444444 0.94444444 0.94444444
0.94444444 0.83333333 0.83333333 1. ]
mean value: 0.9222222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.83333333 0.88888889 0.88888889 0.9 0.88888889
0.88888889 0.7 0.7 1. ]
mean value: 0.8577777777777778
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08970523 0.09563923 0.09762454 0.09816647 0.09601331 0.09740019
0.09686017 0.09789848 0.09540892 0.09602714]
mean value: 0.09607436656951904
key: score_time
value: [0.01693845 0.01738167 0.0185194 0.01805615 0.01852155 0.01874089
0.01831293 0.01796603 0.01841974 0.01797104]
mean value: 0.018082785606384277
key: test_mcc
value: [0.68888889 0.80507649 0.79772404 0.77777778 0.67082039 0.70710678
0.56980288 0.56980288 0.89442719 0.77777778]
mean value: 0.7259205095594103
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.89473684 0.88888889 0.88888889 0.83333333 0.83333333
0.77777778 0.77777778 0.94444444 0.88888889]
mean value: 0.8570175438596491
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.90909091 0.875 0.88888889 0.84210526 0.8
0.75 0.75 0.94736842 0.88888889]
mean value: 0.8493447634237108
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.83333333 1. 0.88888889 0.8 1.
0.85714286 0.85714286 0.9 0.88888889]
mean value: 0.8825396825396825
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.77777778 0.88888889 0.88888889 0.66666667
0.66666667 0.66666667 1. 0.88888889]
mean value: 0.8333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.84444444 0.88888889 0.88888889 0.88888889 0.83333333 0.83333333
0.77777778 0.77777778 0.94444444 0.88888889]
mean value: 0.8566666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 0.83333333 0.77777778 0.8 0.72727273 0.66666667
0.6 0.6 0.9 0.8 ]
mean value: 0.7432323232323232
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00972438 0.00980639 0.0098176 0.00968933 0.00974798 0.00983357
0.00982904 0.00982022 0.00935078 0.00986123]
mean value: 0.009748053550720216
key: score_time
value: [0.00927591 0.00916839 0.00913358 0.00922513 0.00917125 0.00919223
0.00910759 0.00911784 0.00923991 0.00912786]
mean value: 0.009175968170166016
key: test_mcc
value: [0.4719399 0.36666667 0.3721042 0.67082039 0.47140452 0.26726124
0.56980288 0.24253563 0.79772404 0.62017367]
mean value: 0.48504331456099853
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.68421053 0.66666667 0.83333333 0.72222222 0.61111111
0.77777778 0.55555556 0.88888889 0.77777778]
mean value: 0.7254385964912281
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.70588235 0.7 0.57142857 0.82352941 0.66666667 0.46153846
0.75 0.2 0.9 0.71428571]
mean value: 0.6493331178625297
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.7 0.8 0.875 0.83333333 0.75
0.85714286 1. 0.81818182 1. ]
mean value: 0.8383658008658008
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.7 0.44444444 0.77777778 0.55555556 0.33333333
0.66666667 0.11111111 1. 0.55555556]
mean value: 0.5811111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73333333 0.68333333 0.66666667 0.83333333 0.72222222 0.61111111
0.77777778 0.55555556 0.88888889 0.77777778]
mean value: 0.725
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.54545455 0.53846154 0.4 0.7 0.5 0.3
0.6 0.11111111 0.81818182 0.55555556]
mean value: 0.5068764568764569
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.59
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.22694707 1.18884277 1.18088603 1.20042324 1.1879015 1.21066332
1.19598603 1.18754077 1.18028498 1.21017814]
mean value: 1.1969653844833374
key: score_time
value: [0.09555411 0.09508872 0.09232116 0.09383941 0.09496737 0.08769631
0.09453797 0.0888617 0.09319115 0.09555507]
mean value: 0.09316129684448242
key: test_mcc
value: [0.89893315 0.68543653 0.79772404 0.89442719 0.77777778 0.79772404
0.79772404 0.56980288 0.79772404 0.79772404]
mean value: 0.781499770395183
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.84210526 0.88888889 0.94444444 0.88888889 0.88888889
0.88888889 0.77777778 0.88888889 0.88888889]
mean value: 0.8845029239766081
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.85714286 0.875 0.94117647 0.88888889 0.875
0.875 0.75 0.9 0.875 ]
mean value: 0.8778384687208217
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.81818182 1. 1. 0.88888889 1.
1. 0.85714286 0.81818182 1. ]
mean value: 0.9382395382395382
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778
0.77777778 0.66666667 1. 0.77777778]
mean value: 0.8344444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.83888889 0.88888889 0.94444444 0.88888889 0.88888889
0.88888889 0.77777778 0.88888889 0.88888889]
mean value: 0.8838888888888888
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.88888889 0.75 0.77777778 0.88888889 0.8 0.77777778
0.77777778 0.6 0.81818182 0.77777778]
mean value: 0.7857070707070707
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.85279465 0.89258766 0.89829969 0.88843799 0.85441613 0.87707686
0.86382127 0.86021209 0.94803047 0.88763785]
mean value: 0.8823314666748047
key: score_time
value: [0.19736314 0.22194862 0.22088242 0.25448847 0.24951029 0.20679188
0.24040985 0.20101738 0.25407386 0.20660353]
mean value: 0.2253089427947998
key: test_mcc
value: [0.78888889 0.48934516 0.70710678 0.77777778 0.67082039 0.70710678
0.89442719 0.4472136 0.56980288 0.79772404]
mean value: 0.6850213490232173
key: train_mcc
value: [0.96325856 0.95121218 0.92682927 0.95150257 0.96348628 0.93909422
0.96348628 0.97590007 0.93909422 0.93909422]
mean value: 0.951295788364494
key: test_accuracy
value: [0.89473684 0.73684211 0.83333333 0.88888889 0.83333333 0.83333333
0.94444444 0.72222222 0.77777778 0.88888889]
mean value: 0.8353801169590643
key: train_accuracy
value: [0.98159509 0.97546012 0.96341463 0.97560976 0.98170732 0.9695122
0.98170732 0.98780488 0.9695122 0.9695122 ]
mean value: 0.9755835702528804
key: test_fscore
value: [0.88888889 0.7826087 0.8 0.88888889 0.84210526 0.8
0.94117647 0.70588235 0.8 0.875 ]
mean value: 0.8324550560117259
key: train_fscore
value: [0.98181818 0.97560976 0.96341463 0.97590361 0.98181818 0.96969697
0.98181818 0.98795181 0.96969697 0.96969697]
mean value: 0.9757425266476104
key: test_precision
value: [0.88888889 0.69230769 1. 0.88888889 0.8 1.
1. 0.75 0.72727273 1. ]
mean value: 0.8747358197358197
key: train_precision
value: [0.97590361 0.96385542 0.96341463 0.96428571 0.97590361 0.96385542
0.97590361 0.97619048 0.96385542 0.96385542]
mean value: 0.9687023354743014
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.66666667
0.88888889 0.66666667 0.88888889 0.77777778]
mean value: 0.8122222222222222
key: train_recall
value: [0.98780488 0.98765432 0.96341463 0.98780488 0.98780488 0.97560976
0.98780488 1. 0.97560976 0.97560976]
mean value: 0.9829117735621801
key: test_roc_auc
value: [0.89444444 0.72777778 0.83333333 0.88888889 0.83333333 0.83333333
0.94444444 0.72222222 0.77777778 0.88888889]
mean value: 0.8344444444444444
key: train_roc_auc
value: [0.98155676 0.97553448 0.96341463 0.97560976 0.98170732 0.9695122
0.98170732 0.98780488 0.9695122 0.9695122 ]
mean value: 0.975587172538392
key: test_jcc
value: [0.8 0.64285714 0.66666667 0.8 0.72727273 0.66666667
0.88888889 0.54545455 0.66666667 0.77777778]
mean value: 0.7182251082251082
key: train_jcc
value: [0.96428571 0.95238095 0.92941176 0.95294118 0.96428571 0.94117647
0.96428571 0.97619048 0.94117647 0.94117647]
mean value: 0.9527310924369747
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02275062 0.00911093 0.00958419 0.00923228 0.00917268 0.01035643
0.00954223 0.00929189 0.01001048 0.00930905]
mean value: 0.010836076736450196
key: score_time
value: [0.01018906 0.00876546 0.00997353 0.00875711 0.0086844 0.0092597
0.00946522 0.00878239 0.00908685 0.00885129]
mean value: 0.009181499481201172
key: test_mcc
value: [ 0.26666667 0.26257545 0.56980288 0.2236068 0.11396058 -0.11111111
0.34188173 0.23570226 0.2236068 0.34188173]
mean value: 0.24685737827811435
key: train_mcc
value: [0.44782413 0.43577775 0.43902439 0.47649639 0.48911599 0.46396698
0.50003718 0.42762497 0.45125307 0.36683699]
mean value: 0.44979578382501245
key: test_accuracy
value: [0.63157895 0.63157895 0.77777778 0.61111111 0.55555556 0.44444444
0.66666667 0.61111111 0.61111111 0.66666667]
mean value: 0.6207602339181286
key: train_accuracy
value: [0.72392638 0.71779141 0.7195122 0.73780488 0.74390244 0.73170732
0.75 0.71341463 0.72560976 0.68292683]
mean value: 0.7246595840191531
key: test_fscore
value: [0.63157895 0.69565217 0.75 0.58823529 0.5 0.44444444
0.625 0.53333333 0.63157895 0.7 ]
mean value: 0.6099823140545311
key: train_fscore
value: [0.72727273 0.7195122 0.7195122 0.72955975 0.75294118 0.73809524
0.74846626 0.70440252 0.72392638 0.67088608]
mean value: 0.7234574510219577
key: test_precision
value: [0.6 0.61538462 0.85714286 0.625 0.57142857 0.44444444
0.71428571 0.66666667 0.6 0.63636364]
mean value: 0.6330716505716506
key: train_precision
value: [0.72289157 0.71084337 0.7195122 0.75324675 0.72727273 0.72093023
0.75308642 0.72727273 0.72839506 0.69736842]
mean value: 0.7260819477765448
key: test_recall
value: [0.66666667 0.8 0.66666667 0.55555556 0.44444444 0.44444444
0.55555556 0.44444444 0.66666667 0.77777778]
mean value: 0.6022222222222222
key: train_recall
value: [0.73170732 0.72839506 0.7195122 0.70731707 0.7804878 0.75609756
0.74390244 0.68292683 0.7195122 0.64634146]
mean value: 0.7216199939777176
key: test_roc_auc
value: [0.63333333 0.62222222 0.77777778 0.61111111 0.55555556 0.44444444
0.66666667 0.61111111 0.61111111 0.66666667]
mean value: 0.62
key: train_roc_auc
value: [0.72387835 0.71785607 0.7195122 0.73780488 0.74390244 0.73170732
0.75 0.71341463 0.72560976 0.68292683]
mean value: 0.7246612466124661
key: test_jcc
value: [0.46153846 0.53333333 0.6 0.41666667 0.33333333 0.28571429
0.45454545 0.36363636 0.46153846 0.53846154]
mean value: 0.44487678987678986
key: train_jcc
value: [0.57142857 0.56190476 0.56190476 0.57425743 0.60377358 0.58490566
0.59803922 0.54368932 0.56730769 0.5047619 ]
mean value: 0.5671972899407909
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07006073 0.05620217 0.05455995 0.05443859 0.05530071 0.04911423
0.05309939 0.05028129 0.06260443 0.05637097]
mean value: 0.056203246116638184
key: score_time
value: [0.01024413 0.01085782 0.01115632 0.01059151 0.01034355 0.01052856
0.0102272 0.01016331 0.0101974 0.01014662]
mean value: 0.010445642471313476
key: test_mcc
value: [1. 0.89893315 1. 0.89442719 0.89442719 0.89442719
0.89442719 0.67082039 0.89442719 1. ]
mean value: 0.9041889498200506
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94736842 1. 0.94444444 0.94444444 0.94444444
0.94444444 0.83333333 0.94444444 1. ]
mean value: 0.9502923976608187
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 1. 0.94117647 0.94736842 0.94117647
0.94117647 0.82352941 0.94736842 1. ]
mean value: 0.9494176618015627
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.90909091 1. 1. 0.9 1.
1. 0.875 0.9 1. ]
mean value: 0.9584090909090909
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 1. 0.88888889
0.88888889 0.77777778 1. 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94444444 1. 0.94444444 0.94444444 0.94444444
0.94444444 0.83333333 0.94444444 1. ]
mean value: 0.95
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 1. 0.88888889 0.9 0.88888889
0.88888889 0.7 0.9 1. ]
mean value: 0.9075757575757576
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.83
Accuracy on Blind test: 0.92
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01936173 0.02427292 0.02441263 0.04486084 0.04475188 0.04609394
0.03421617 0.04422712 0.02395058 0.04555655]
mean value: 0.03517043590545654
key: score_time
value: [0.01165771 0.01188326 0.02139616 0.0117929 0.02368331 0.01183486
0.02326417 0.01176476 0.0116334 0.02139091]
mean value: 0.016030144691467286
key: test_mcc
value: [0.80903983 0.36666667 0.77777778 0.56980288 0.62017367 0.79772404
0.70710678 0.47140452 0.47140452 0.70710678]
mean value: 0.6298207473817191
key: train_mcc
value: [0.98780488 1. 0.97590007 0.98787834 0.98787834 1.
0.98787834 0.98787834 1. 0.97560976]
mean value: 0.9890828066723727
key: test_accuracy
value: [0.89473684 0.68421053 0.88888889 0.77777778 0.77777778 0.88888889
0.83333333 0.72222222 0.72222222 0.83333333]
mean value: 0.8023391812865497
key: train_accuracy
value: [0.99386503 1. 0.98780488 0.99390244 0.99390244 1.
0.99390244 0.99390244 1. 0.98780488]
mean value: 0.9945084542869969
key: test_fscore
value: [0.9 0.7 0.88888889 0.75 0.71428571 0.875
0.8 0.66666667 0.66666667 0.8 ]
mean value: 0.7761507936507936
key: train_fscore
value: [0.99386503 1. 0.98765432 0.99386503 0.99386503 1.
0.99393939 0.99393939 1. 0.98780488]
mean value: 0.9944933078939763
key: test_precision
value: [0.81818182 0.7 0.88888889 0.85714286 1. 1.
1. 0.83333333 0.83333333 1. ]
mean value: 0.893088023088023
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.98795181 0.98795181 1. 0.98780488]
mean value: 0.9963708492506612
key: test_recall
value: [1. 0.7 0.88888889 0.66666667 0.55555556 0.77777778
0.66666667 0.55555556 0.55555556 0.66666667]
mean value: 0.7033333333333334
key: train_recall
value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 1.
1. 1. 1. 0.98780488]
mean value: 0.9926829268292683
key: test_roc_auc
value: [0.9 0.68333333 0.88888889 0.77777778 0.77777778 0.88888889
0.83333333 0.72222222 0.72222222 0.83333333]
mean value: 0.8027777777777778
key: train_roc_auc
value: [0.99390244 1. 0.98780488 0.99390244 0.99390244 1.
0.99390244 0.99390244 1. 0.98780488]
mean value: 0.9945121951219512
key: test_jcc
value: [0.81818182 0.53846154 0.8 0.6 0.55555556 0.77777778
0.66666667 0.5 0.5 0.66666667]
mean value: 0.6423310023310024
key: train_jcc
value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 1.
0.98795181 0.98795181 1. 0.97590361]
mean value: 0.9890831619159565
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01200485 0.01285028 0.00930858 0.00868058 0.00853205 0.00864029
0.00857878 0.00883126 0.00857115 0.00861025]
mean value: 0.009460806846618652
key: score_time
value: [0.01128626 0.0095551 0.00850296 0.00831437 0.00825906 0.00825787
0.00840831 0.00830722 0.00827265 0.00831676]
mean value: 0.008748054504394531
key: test_mcc
value: [0.06900656 0.25844328 0.77777778 0.4472136 0.56980288 0.34188173
0.33333333 0.11111111 0.11111111 0.4472136 ]
mean value: 0.34668949725554976
key: train_mcc
value: [0.52587807 0.49804037 0.44112877 0.47850059 0.41512835 0.45533504
0.49147319 0.46845799 0.52757758 0.47735225]
mean value: 0.4778872199982366
key: test_accuracy
value: [0.52631579 0.63157895 0.88888889 0.72222222 0.77777778 0.66666667
0.66666667 0.55555556 0.55555556 0.72222222]
mean value: 0.671345029239766
key: train_accuracy
value: [0.7607362 0.74846626 0.7195122 0.73780488 0.70731707 0.72560976
0.74390244 0.73170732 0.76219512 0.73780488]
mean value: 0.7375056112524315
key: test_fscore
value: [0.57142857 0.66666667 0.88888889 0.70588235 0.8 0.625
0.66666667 0.55555556 0.55555556 0.73684211]
mean value: 0.6772486362966239
key: train_fscore
value: [0.77714286 0.75449102 0.73255814 0.75144509 0.71428571 0.74285714
0.75862069 0.75 0.77456647 0.74853801]
mean value: 0.750450513382939
key: test_precision
value: [0.5 0.63636364 0.88888889 0.75 0.72727273 0.71428571
0.66666667 0.55555556 0.55555556 0.7 ]
mean value: 0.6694588744588744
key: train_precision
value: [0.7311828 0.73255814 0.7 0.71428571 0.69767442 0.69892473
0.7173913 0.70212766 0.73626374 0.71910112]
mean value: 0.7149509623088506
key: test_recall
value: [0.66666667 0.7 0.88888889 0.66666667 0.88888889 0.55555556
0.66666667 0.55555556 0.55555556 0.77777778]
mean value: 0.6922222222222222
key: train_recall
value: [0.82926829 0.77777778 0.76829268 0.79268293 0.73170732 0.79268293
0.80487805 0.80487805 0.81707317 0.7804878 ]
mean value: 0.7899728997289973
key: test_roc_auc
value: [0.53333333 0.62777778 0.88888889 0.72222222 0.77777778 0.66666667
0.66666667 0.55555556 0.55555556 0.72222222]
mean value: 0.6716666666666666
key: train_roc_auc
value: [0.76031316 0.74864499 0.7195122 0.73780488 0.70731707 0.72560976
0.74390244 0.73170732 0.76219512 0.73780488]
mean value: 0.7374811803673592
key: test_jcc
value: [0.4 0.5 0.8 0.54545455 0.66666667 0.45454545
0.5 0.38461538 0.38461538 0.58333333]
mean value: 0.5219230769230769
key: train_jcc
value: [0.63551402 0.60576923 0.57798165 0.60185185 0.55555556 0.59090909
0.61111111 0.6 0.63207547 0.59813084]
mean value: 0.6008898823084184
MCC on Blind test: 0.13
Accuracy on Blind test: 0.59
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01156712 0.0142765 0.01810694 0.0145216 0.0144875 0.01617956
0.01846528 0.01412177 0.03151178 0.01461315]
mean value: 0.01678512096405029
key: score_time
value: [0.00826836 0.01126075 0.01134539 0.01134443 0.0113399 0.01279712
0.01281691 0.02762818 0.02830529 0.0123601 ]
mean value: 0.014746642112731934
key: test_mcc
value: [0.89893315 0.26257545 0.53452248 0.79772404 0.89442719 0.2236068
0.47140452 0.2236068 0.26726124 0.4472136 ]
mean value: 0.5021275267511051
key: train_mcc
value: [0.89510866 0.90289608 0.85224163 0.82065181 0.9067647 0.89565496
0.94077493 0.83149718 0.60553007 0.64546362]
mean value: 0.8296583633781469
key: test_accuracy
value: [0.94736842 0.63157895 0.72222222 0.88888889 0.94444444 0.61111111
0.72222222 0.61111111 0.61111111 0.66666667]
mean value: 0.735672514619883
key: train_accuracy
value: [0.94478528 0.95092025 0.92073171 0.90243902 0.95121951 0.94512195
0.9695122 0.91463415 0.76829268 0.79878049]
mean value: 0.9066437228789466
key: test_fscore
value: [0.94117647 0.69565217 0.61538462 0.875 0.94736842 0.63157895
0.66666667 0.58823529 0.46153846 0.75 ]
mean value: 0.7172601050629722
key: train_fscore
value: [0.94193548 0.94936709 0.91390728 0.89189189 0.94871795 0.94797688
0.96855346 0.91764706 0.6984127 0.83076923]
mean value: 0.9009179023594288
key: test_precision
value: [1. 0.61538462 1. 1. 0.9 0.6
0.83333333 0.625 0.75 0.6 ]
mean value: 0.7923717948717949
key: train_precision
value: [1. 0.97402597 1. 1. 1. 0.9010989
1. 0.88636364 1. 0.71681416]
mean value: 0.9478302670780547
key: test_recall
value: [0.88888889 0.8 0.44444444 0.77777778 1. 0.66666667
0.55555556 0.55555556 0.33333333 1. ]
mean value: 0.7022222222222222
key: train_recall
value: [0.8902439 0.92592593 0.84146341 0.80487805 0.90243902 1.
0.93902439 0.95121951 0.53658537 0.98780488]
mean value: 0.8779584462511292
key: test_roc_auc
value: [0.94444444 0.62222222 0.72222222 0.88888889 0.94444444 0.61111111
0.72222222 0.61111111 0.61111111 0.66666667]
mean value: 0.7344444444444445
key: train_roc_auc
value: [0.94512195 0.95076784 0.92073171 0.90243902 0.95121951 0.94512195
0.9695122 0.91463415 0.76829268 0.79878049]
mean value: 0.9066621499548329
key: test_jcc
value: [0.88888889 0.53333333 0.44444444 0.77777778 0.9 0.46153846
0.5 0.41666667 0.3 0.6 ]
mean value: 0.5822649572649573
key: train_jcc
value: [0.8902439 0.90361446 0.84146341 0.80487805 0.90243902 0.9010989
0.93902439 0.84782609 0.53658537 0.71052632]
mean value: 0.8277699908017685
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01393223 0.01368308 0.01342058 0.01372242 0.01387477 0.0146842
0.01359653 0.01407099 0.01386142 0.01323581]
mean value: 0.013808202743530274
key: score_time
value: [0.01012206 0.01130271 0.01128602 0.01126742 0.01138663 0.01142907
0.01143241 0.01137114 0.011307 0.01138449]
mean value: 0.01122889518737793
key: test_mcc
value: [0.59554321 0.48934516 0.47140452 0.70710678 0.53452248 0.62017367
0.47140452 0.56980288 0.26726124 0.70710678]
mean value: 0.543367126070614
key: train_mcc
value: [0.67220873 0.92666768 0.51140831 0.71034298 0.53033009 0.92932038
0.65275337 0.91470217 0.67180908 0.82951506]
mean value: 0.7349057845494104
key: test_accuracy
value: [0.78947368 0.73684211 0.72222222 0.83333333 0.72222222 0.77777778
0.72222222 0.77777778 0.61111111 0.83333333]
mean value: 0.7526315789473684
key: train_accuracy
value: [0.81595092 0.96319018 0.70731707 0.83536585 0.7195122 0.96341463
0.79878049 0.95731707 0.81097561 0.91463415]
mean value: 0.848645817746521
key: test_fscore
value: [0.8 0.7826087 0.76190476 0.8 0.61538462 0.71428571
0.66666667 0.75 0.46153846 0.8 ]
mean value: 0.7152388915432394
key: train_fscore
value: [0.84375 0.96341463 0.77358491 0.80291971 0.61016949 0.96202532
0.7480916 0.95705521 0.76691729 0.91358025]
mean value: 0.834150841374106
key: test_precision
value: [0.72727273 0.69230769 0.66666667 1. 1. 1.
0.83333333 0.85714286 0.75 1. ]
mean value: 0.8526723276723277
key: train_precision
value: [0.73636364 0.95180723 0.63076923 1. 1. 1.
1. 0.96296296 1. 0.925 ]
mean value: 0.9206903059011493
key: test_recall
value: [0.88888889 0.9 0.88888889 0.66666667 0.44444444 0.55555556
0.55555556 0.66666667 0.33333333 0.66666667]
mean value: 0.6566666666666666
key: train_recall
value: [0.98780488 0.97530864 1. 0.67073171 0.43902439 0.92682927
0.59756098 0.95121951 0.62195122 0.90243902]
mean value: 0.8072869617585064
key: test_roc_auc
value: [0.79444444 0.72777778 0.72222222 0.83333333 0.72222222 0.77777778
0.72222222 0.77777778 0.61111111 0.83333333]
mean value: 0.7522222222222221
key: train_roc_auc
value: [0.81489009 0.96326408 0.70731707 0.83536585 0.7195122 0.96341463
0.79878049 0.95731707 0.81097561 0.91463415]
mean value: 0.8485471243601325
key: test_jcc
value: [0.66666667 0.64285714 0.61538462 0.66666667 0.44444444 0.55555556
0.5 0.6 0.3 0.66666667]
mean value: 0.5658241758241758
key: train_jcc
value: [0.72972973 0.92941176 0.63076923 0.67073171 0.43902439 0.92682927
0.59756098 0.91764706 0.62195122 0.84090909]
mean value: 0.7304564435913073
MCC on Blind test: 0.39
Accuracy on Blind test: 0.62
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11691046 0.10347772 0.10455108 0.10615063 0.10520744 0.10436273
0.10814619 0.10973048 0.10878086 0.10570621]
mean value: 0.1073023796081543
key: score_time
value: [0.01599526 0.01573205 0.01532221 0.01451206 0.01454496 0.01522088
0.01593971 0.01650333 0.01589751 0.01483393]
mean value: 0.015450191497802735
key: test_mcc
value: [1. 0.68543653 1. 1. 0.89442719 0.89442719
0.89442719 0.56980288 0.89442719 0.79772404]
mean value: 0.8630672208352949
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.84210526 1. 1. 0.94444444 0.94444444
0.94444444 0.77777778 0.94444444 0.88888889]
mean value: 0.9286549707602338
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.85714286 1. 1. 0.94736842 0.94117647
0.94117647 0.75 0.94736842 0.875 ]
mean value: 0.9259232640424591
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.81818182 1. 1. 0.9 1.
1. 0.85714286 0.9 1. ]
mean value: 0.9475324675324676
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.9 1. 1. 1. 0.88888889
0.88888889 0.66666667 1. 0.77777778]
mean value: 0.9122222222222223
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.83888889 1. 1. 0.94444444 0.94444444
0.94444444 0.77777778 0.94444444 0.88888889]
mean value: 0.9283333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.75 1. 1. 0.9 0.88888889
0.88888889 0.6 0.9 0.77777778]
mean value: 0.8705555555555555
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03694677 0.03373933 0.05565667 0.05482864 0.0287919 0.0382278
0.03971195 0.04310226 0.05968165 0.02799916]
mean value: 0.04186861515045166
key: score_time
value: [0.02125978 0.01727152 0.03659678 0.01796579 0.01804519 0.01705861
0.02685332 0.026829 0.01939368 0.01746464]
mean value: 0.021873831748962402
key: test_mcc
value: [0.71611487 0.89893315 1. 0.89442719 0.89442719 1.
0.89442719 0.4472136 0.67082039 0.79772404]
mean value: 0.8214087620957531
key: train_mcc
value: [0.97575667 1. 0.98787834 1. 0.98787834 1.
0.98787834 0.98787834 0.98787834 0.98787834]
mean value: 0.9903026713658697
key: test_accuracy
value: [0.84210526 0.94736842 1. 0.94444444 0.94444444 1.
0.94444444 0.72222222 0.83333333 0.88888889]
mean value: 0.9067251461988304
key: train_accuracy
value: [0.98773006 1. 0.99390244 1. 0.99390244 1.
0.99390244 0.99390244 0.99390244 0.99390244]
mean value: 0.9951144695496035
key: test_fscore
value: [0.8 0.95238095 1. 0.94117647 0.94736842 1.
0.94117647 0.70588235 0.84210526 0.875 ]
mean value: 0.9005089930709126
key: train_fscore
value: [0.98765432 1. 0.99386503 1. 0.99386503 1.
0.99386503 0.99393939 0.99393939 0.99393939]
mean value: 0.9951067594830376
key: test_precision
value: [1. 0.90909091 1. 1. 0.9 1.
1. 0.75 0.8 1. ]
mean value: 0.9359090909090909
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.98795181 0.98795181 0.98795181]
mean value: 0.9963855421686747
key: test_recall
value: [0.66666667 1. 1. 0.88888889 1. 1.
0.88888889 0.66666667 0.88888889 0.77777778]
mean value: 0.8777777777777778
key: train_recall
value: [0.97560976 1. 0.98780488 1. 0.98780488 1.
0.98780488 1. 1. 1. ]
mean value: 0.9939024390243902
key: test_roc_auc
value: [0.83333333 0.94444444 1. 0.94444444 0.94444444 1.
0.94444444 0.72222222 0.83333333 0.88888889]
mean value: 0.9055555555555556
key: train_roc_auc
value: [0.98780488 1. 0.99390244 1. 0.99390244 1.
0.99390244 0.99390244 0.99390244 0.99390244]
mean value: 0.9951219512195122
key: test_jcc
value: [0.66666667 0.90909091 1. 0.88888889 0.9 1.
0.88888889 0.54545455 0.72727273 0.77777778]
mean value: 0.8304040404040404
key: train_jcc
value: [0.97560976 1. 0.98780488 1. 0.98780488 1.
0.98780488 0.98795181 0.98795181 0.98795181]
mean value: 0.9902879811930649
MCC on Blind test: 0.62
Accuracy on Blind test: 0.81
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.05209708 0.05312276 0.07166195 0.07947183 0.07822824 0.07365823
0.06713343 0.06822729 0.07329917 0.07334185]
mean value: 0.0690241813659668
key: score_time
value: [0.02100492 0.01554871 0.02480125 0.02019 0.02346206 0.02492499
0.02564955 0.02025509 0.02529502 0.02458215]
mean value: 0.022571372985839843
key: test_mcc
value: [0.36666667 0.36666667 0.4472136 0.77777778 0.56980288 0.4472136
0.56980288 0.47140452 0.77777778 0.70710678]
mean value: 0.5501433146462763
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.68421053 0.68421053 0.72222222 0.88888889 0.77777778 0.66666667
0.77777778 0.72222222 0.88888889 0.83333333]
mean value: 0.7646198830409356
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.7 0.70588235 0.88888889 0.75 0.5
0.75 0.66666667 0.88888889 0.8 ]
mean value: 0.7316993464052287
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.7 0.75 0.88888889 0.85714286 1.
0.85714286 0.83333333 0.88888889 1. ]
mean value: 0.8442063492063492
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.7 0.66666667 0.88888889 0.66666667 0.33333333
0.66666667 0.55555556 0.88888889 0.66666667]
mean value: 0.6699999999999999
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.68333333 0.68333333 0.72222222 0.88888889 0.77777778 0.66666667
0.77777778 0.72222222 0.88888889 0.83333333]
mean value: 0.7644444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.53846154 0.54545455 0.8 0.6 0.33333333
0.6 0.5 0.8 0.66666667]
mean value: 0.5883916083916084
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.57
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.30187654 0.29588366 0.30465126 0.29863501 0.29659557 0.29951715
0.30715179 0.30075073 0.30479479 0.31094193]
mean value: 0.30207984447479247
key: score_time
value: [0.01024866 0.00926256 0.00919223 0.00947666 0.00942016 0.00918174
0.00935078 0.01026249 0.01018763 0.00962329]
mean value: 0.00962061882019043
key: test_mcc
value: [0.89893315 0.80507649 1. 0.89442719 0.89442719 1.
0.89442719 0.67082039 0.89442719 1. ]
mean value: 0.8952538793100003
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.89473684 1. 0.94444444 0.94444444 1.
0.94444444 0.83333333 0.94444444 1. ]
mean value: 0.9453216374269006
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.90909091 1. 0.94117647 0.94736842 1.
0.94117647 0.82352941 0.94736842 1. ]
mean value: 0.9450886574725584
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 1. 1. 0.9 1.
1. 0.875 0.9 1. ]
mean value: 0.9508333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.88888889 1. 1.
0.88888889 0.77777778 1. 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.88888889 1. 0.94444444 0.94444444 1.
0.94444444 0.83333333 0.94444444 1. ]
mean value: 0.9444444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.83333333 1. 0.88888889 0.9 1.
0.88888889 0.7 0.9 1. ]
mean value: 0.9
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02119136 0.02133584 0.02049875 0.01957178 0.02819681 0.01930714
0.01964378 0.01967049 0.01922727 0.01945806]
mean value: 0.02081012725830078
key: score_time
value: [0.01232815 0.01211047 0.01209617 0.01658344 0.01229763 0.01409721
0.01502109 0.01486349 0.01319528 0.01452518]
mean value: 0.013711810111999512
key: test_mcc
value: [0.48989795 0.45643546 0.70710678 0.53452248 0.79772404 0.79772404
0.79772404 0.70710678 0.70710678 0.89442719]
mean value: 0.6889775537181078
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.68421053 0.68421053 0.83333333 0.72222222 0.88888889 0.88888889
0.88888889 0.83333333 0.83333333 0.94444444]
mean value: 0.8201754385964912
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.76923077 0.85714286 0.7826087 0.9 0.9
0.9 0.85714286 0.85714286 0.94736842]
mean value: 0.8520636457364146
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.625 0.75 0.64285714 0.81818182 0.81818182
0.81818182 0.75 0.75 0.9 ]
mean value: 0.7472402597402598
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.7 0.66666667 0.83333333 0.72222222 0.88888889 0.88888889
0.88888889 0.83333333 0.83333333 0.94444444]
mean value: 0.82
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.625 0.75 0.64285714 0.81818182 0.81818182
0.81818182 0.75 0.75 0.9 ]
mean value: 0.7472402597402598
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.62
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03742886 0.0137198 0.01522708 0.01617217 0.03871441 0.01832986
0.03243613 0.06556869 0.03291512 0.04149723]
mean value: 0.031200933456420898
key: score_time
value: [0.011657 0.01162052 0.01281309 0.01283717 0.0118587 0.0137105
0.02468085 0.01881003 0.02135801 0.02048373]
mean value: 0.015982961654663085
key: test_mcc
value: [0.78888889 0.57777778 0.79772404 0.89442719 0.77777778 0.56980288
0.34188173 0.56980288 0.56980288 0.70710678]
mean value: 0.6594992828121856
key: train_mcc
value: [0.96326408 0.95121218 0.96348628 0.97590007 0.93909422 0.97560976
0.92682927 0.97560976 0.96348628 0.95150257]
mean value: 0.9585994467892424
key: test_accuracy
value: [0.89473684 0.78947368 0.88888889 0.94444444 0.88888889 0.77777778
0.66666667 0.77777778 0.77777778 0.83333333]
mean value: 0.8239766081871345
key: train_accuracy
value: [0.98159509 0.97546012 0.98170732 0.98780488 0.9695122 0.98780488
0.96341463 0.98780488 0.98170732 0.97560976]
mean value: 0.9792421068382463
key: test_fscore
value: [0.88888889 0.8 0.875 0.94117647 0.88888889 0.75
0.625 0.75 0.75 0.8 ]
mean value: 0.8068954248366014
key: train_fscore
value: [0.98159509 0.97560976 0.98159509 0.98765432 0.96969697 0.98780488
0.96341463 0.98780488 0.98159509 0.97530864]
mean value: 0.9792079355075015
key: test_precision
value: [0.88888889 0.8 1. 1. 0.88888889 0.85714286
0.71428571 0.85714286 0.85714286 1. ]
mean value: 0.8863492063492063
key: train_precision
value: [0.98765432 0.96385542 0.98765432 1. 0.96385542 0.98780488
0.96341463 0.98780488 0.98765432 0.9875 ]
mean value: 0.9817198196580359
key: test_recall
value: [0.88888889 0.8 0.77777778 0.88888889 0.88888889 0.66666667
0.55555556 0.66666667 0.66666667 0.66666667]
mean value: 0.7466666666666666
key: train_recall
value: [0.97560976 0.98765432 0.97560976 0.97560976 0.97560976 0.98780488
0.96341463 0.98780488 0.97560976 0.96341463]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9768142125865703
key: test_roc_auc
value: [0.89444444 0.78888889 0.88888889 0.94444444 0.88888889 0.77777778
0.66666667 0.77777778 0.77777778 0.83333333]
mean value: 0.8238888888888889
key: train_roc_auc
value: [0.98163204 0.97553448 0.98170732 0.98780488 0.9695122 0.98780488
0.96341463 0.98780488 0.98170732 0.97560976]
mean value: 0.9792532369768142
key: test_jcc
value: [0.8 0.66666667 0.77777778 0.88888889 0.8 0.6
0.45454545 0.6 0.6 0.66666667]
mean value: 0.6854545454545454
key: train_jcc
value: [0.96385542 0.95238095 0.96385542 0.97560976 0.94117647 0.97590361
0.92941176 0.97590361 0.96385542 0.95180723]
mean value: 0.9593759666664197
MCC on Blind test: 0.54
Accuracy on Blind test: 0.78
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.24665713 0.2200036 0.20462394 0.23826599 0.20800304 0.20692873
0.26327562 0.2772572 0.24964952 0.20823383]
mean value: 0.23228986263275148
key: score_time
value: [0.02310228 0.01178694 0.01963353 0.02278471 0.02360058 0.02142
0.02313614 0.01516008 0.02230239 0.01481676]
mean value: 0.019774341583251955
key: test_mcc
value: [0.78888889 0.4719399 0.79772404 0.77777778 0.77777778 0.56980288
0.34188173 0.56980288 0.47140452 0.56980288]
mean value: 0.6136803280450694
key: train_mcc
value: [0.96326408 0.96326408 0.96348628 0.96348628 0.93909422 0.97560976
0.92682927 0.97560976 0.97560976 0.97560976]
mean value: 0.962186323547305
key: test_accuracy
value: [0.89473684 0.73684211 0.88888889 0.88888889 0.88888889 0.77777778
0.66666667 0.77777778 0.72222222 0.77777778]
mean value: 0.802046783625731
key: train_accuracy
value: [0.98159509 0.98159509 0.98170732 0.98170732 0.9695122 0.98780488
0.96341463 0.98780488 0.98780488 0.98780488]
mean value: 0.9810751159658836
key: test_fscore
value: [0.88888889 0.76190476 0.875 0.88888889 0.88888889 0.75
0.625 0.75 0.66666667 0.75 ]
mean value: 0.7845238095238095
key: train_fscore
value: [0.98159509 0.98159509 0.98159509 0.98159509 0.96969697 0.98780488
0.96341463 0.98780488 0.98780488 0.98780488]
mean value: 0.9810711484136592
key: test_precision
value: [0.88888889 0.72727273 1. 0.88888889 0.88888889 0.85714286
0.71428571 0.85714286 0.83333333 0.85714286]
mean value: 0.8512987012987012
key: train_precision
value: [0.98765432 0.97560976 0.98765432 0.98765432 0.96385542 0.98780488
0.96341463 0.98780488 0.98780488 0.98780488]
mean value: 0.9817062287088734
key: test_recall
value: [0.88888889 0.8 0.77777778 0.88888889 0.88888889 0.66666667
0.55555556 0.66666667 0.55555556 0.66666667]
mean value: 0.7355555555555555
key: train_recall
value: [0.97560976 0.98765432 0.97560976 0.97560976 0.97560976 0.98780488
0.96341463 0.98780488 0.98780488 0.98780488]
mean value: 0.9804727491719362
key: test_roc_auc
value: [0.89444444 0.73333333 0.88888889 0.88888889 0.88888889 0.77777778
0.66666667 0.77777778 0.72222222 0.77777778]
mean value: 0.8016666666666666
key: train_roc_auc
value: [0.98163204 0.98163204 0.98170732 0.98170732 0.9695122 0.98780488
0.96341463 0.98780488 0.98780488 0.98780488]
mean value: 0.9810825052694971
key: test_jcc
value: [0.8 0.61538462 0.77777778 0.8 0.8 0.6
0.45454545 0.6 0.5 0.6 ]
mean value: 0.6547707847707848
key: train_jcc
value: [0.96385542 0.96385542 0.96385542 0.96385542 0.94117647 0.97590361
0.92941176 0.97590361 0.97590361 0.97590361]
mean value: 0.9629624379872431
MCC on Blind test: 0.54
Accuracy on Blind test: 0.78
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02531695 0.02301788 0.02503061 0.02468991 0.02239776 0.02449679
0.02545214 0.02418399 0.02235746 0.02392077]
mean value: 0.024086427688598634
key: score_time
value: [0.01164055 0.01155734 0.01147318 0.01160502 0.01145649 0.01149535
0.01150584 0.01157832 0.01148558 0.0115664 ]
mean value: 0.011536407470703124
key: test_mcc
value: [0.33333333 0. 0.66666667 0.50709255 0.46666667 0.1
0.46666667 0.69006556 0.1490712 0.1 ]
mean value: 0.34795626440127836
key: train_mcc
value: [0.80454045 0.82368777 0.80454045 0.77005354 0.73053854 0.78784497
0.82541478 0.78649572 0.84522516 0.80634253]
mean value: 0.7984683900068174
key: test_accuracy
value: [0.66666667 0.5 0.83333333 0.75 0.72727273 0.54545455
0.72727273 0.81818182 0.54545455 0.54545455]
mean value: 0.6659090909090909
key: train_accuracy
value: [0.90196078 0.91176471 0.90196078 0.88235294 0.86407767 0.89320388
0.91262136 0.89320388 0.9223301 0.90291262]
mean value: 0.8986388730249382
key: test_fscore
value: [0.66666667 0.57142857 0.83333333 0.72727273 0.72727273 0.54545455
0.72727273 0.8 0.44444444 0.54545455]
mean value: 0.6588600288600288
key: train_fscore
value: [0.9 0.91089109 0.9 0.875 0.86 0.89108911
0.91262136 0.89108911 0.92 0.9 ]
mean value: 0.8960690666153994
key: test_precision
value: [0.66666667 0.5 0.83333333 0.8 0.66666667 0.5
0.66666667 1. 0.66666667 0.6 ]
mean value: 0.69
key: train_precision
value: [0.91836735 0.92 0.91836735 0.93333333 0.89583333 0.91836735
0.92156863 0.9 0.93877551 0.91836735]
mean value: 0.9182980192076831
key: test_recall
value: [0.66666667 0.66666667 0.83333333 0.66666667 0.8 0.6
0.8 0.66666667 0.33333333 0.5 ]
mean value: 0.6533333333333333
key: train_recall
value: [0.88235294 0.90196078 0.88235294 0.82352941 0.82692308 0.86538462
0.90384615 0.88235294 0.90196078 0.88235294]
mean value: 0.8753016591251885
key: test_roc_auc
value: [0.66666667 0.5 0.83333333 0.75 0.73333333 0.55
0.73333333 0.83333333 0.56666667 0.55 ]
mean value: 0.6716666666666666
key: train_roc_auc
value: [0.90196078 0.91176471 0.90196078 0.88235294 0.86444193 0.89347662
0.91270739 0.89309955 0.92213424 0.90271493]
mean value: 0.8986613876319759
key: test_jcc
value: [0.5 0.4 0.71428571 0.57142857 0.57142857 0.375
0.57142857 0.66666667 0.28571429 0.375 ]
mean value: 0.503095238095238
key: train_jcc
value: [0.81818182 0.83636364 0.81818182 0.77777778 0.75438596 0.80357143
0.83928571 0.80357143 0.85185185 0.81818182]
mean value: 0.8121353256879573
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.57380795 0.74630857 0.60518217 0.58629417 0.72743988 0.58024836
0.6250155 0.59622169 0.65928268 0.60975552]
mean value: 0.6309556484222412
key: score_time
value: [0.01315403 0.01296639 0.01179981 0.01184464 0.01358771 0.0130477
0.01297903 0.01181436 0.01576495 0.01186824]
mean value: 0.012882685661315918
key: test_mcc
value: [ 0.33333333 0.4472136 0. 0.50709255 0.06900656 0.46666667
-0.06900656 0.46666667 0.55901699 0.26666667]
mean value: 0.30466564760453485
key: train_mcc
value: [1. 1. 0.47140452 0.61209384 1. 1.
1. 0.67006033 1. 0.94190878]
mean value: 0.8695467473673132
key: test_accuracy
value: [0.66666667 0.66666667 0.5 0.75 0.54545455 0.72727273
0.45454545 0.72727273 0.72727273 0.63636364]
mean value: 0.6401515151515151
key: train_accuracy
value: [1. 1. 0.73529412 0.80392157 1. 1.
1. 0.83495146 1. 0.97087379]
mean value: 0.9345040928992956
key: test_fscore
value: [0.66666667 0.75 0.5 0.72727273 0.44444444 0.72727273
0.5 0.72727273 0.66666667 0.66666667]
mean value: 0.6376262626262625
key: train_fscore
value: [1. 1. 0.72727273 0.79166667 1. 1.
1. 0.83495146 1. 0.97029703]
mean value: 0.9324187879953043
key: test_precision
value: [0.66666667 0.6 0.5 0.8 0.5 0.66666667
0.42857143 0.8 1. 0.66666667]
mean value: 0.6628571428571428
key: train_precision
value: [1. 1. 0.75 0.84444444 1. 1.
1. 0.82692308 1. 0.98 ]
mean value: 0.9401367521367521
key: test_recall
value: [0.66666667 1. 0.5 0.66666667 0.4 0.8
0.6 0.66666667 0.5 0.66666667]
mean value: 0.6466666666666666
key: train_recall
value: [1. 1. 0.70588235 0.74509804 1. 1.
1. 0.84313725 1. 0.96078431]
mean value: 0.9254901960784314
key: test_roc_auc
value: [0.66666667 0.66666667 0.5 0.75 0.53333333 0.73333333
0.46666667 0.73333333 0.75 0.63333333]
mean value: 0.6433333333333333
key: train_roc_auc
value: [1. 1. 0.73529412 0.80392157 1. 1.
1. 0.83503017 1. 0.97077677]
mean value: 0.9345022624434389
key: test_jcc
value: [0.5 0.6 0.33333333 0.57142857 0.28571429 0.57142857
0.33333333 0.57142857 0.5 0.5 ]
mean value: 0.4766666666666666
key: train_jcc
value: [1. 1. 0.57142857 0.65517241 1. 1.
1. 0.71666667 1. 0.94230769]
mean value: 0.8885575344196034
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01224828 0.01091671 0.00891137 0.00843143 0.00929499 0.00856352
0.00876927 0.00851512 0.00861287 0.0084939 ]
mean value: 0.00927574634552002
key: score_time
value: [0.011657 0.00899959 0.00917506 0.00841236 0.00840473 0.00845885
0.00845647 0.00876474 0.00840163 0.00846171]
mean value: 0.008919215202331543
key: test_mcc
value: [0.35355339 0. 0.4472136 0.30151134 0.43033148 0.55901699
0.28867513 0. 0.2608746 0. ]
mean value: 0.26411765399276665
key: train_mcc
value: [0.49265379 0.4152274 0.4564139 0.43133109 0.40048439 0.3666794
0.40048439 0.42470149 0.37638633 0.45573272]
mean value: 0.4220094905915334
key: test_accuracy
value: [0.66666667 0.5 0.66666667 0.58333333 0.63636364 0.72727273
0.54545455 0.54545455 0.63636364 0.54545455]
mean value: 0.6053030303030302
key: train_accuracy
value: [0.70588235 0.64705882 0.70588235 0.65686275 0.6407767 0.62135922
0.6407767 0.65048544 0.62135922 0.66990291]
mean value: 0.6560346468684561
key: test_fscore
value: [0.71428571 0.66666667 0.75 0.70588235 0.71428571 0.76923077
0.66666667 0.70588235 0.71428571 0.70588235]
mean value: 0.7113068304244774
key: train_fscore
value: [0.76923077 0.73913043 0.75806452 0.74452555 0.73758865 0.72727273
0.73758865 0.73913043 0.72340426 0.75 ]
mean value: 0.742593598992669
key: test_precision
value: [0.625 0.5 0.6 0.54545455 0.55555556 0.625
0.5 0.54545455 0.625 0.54545455]
mean value: 0.5666919191919192
key: train_precision
value: [0.63291139 0.5862069 0.64383562 0.59302326 0.58426966 0.57142857
0.58426966 0.5862069 0.56666667 0.6 ]
mean value: 0.5948818621698756
key: test_recall
value: [0.83333333 1. 1. 1. 1. 1.
1. 1. 0.83333333 1. ]
mean value: 0.9666666666666667
key: train_recall
value: [0.98039216 1. 0.92156863 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9901960784313726
key: test_roc_auc
value: [0.66666667 0.5 0.66666667 0.58333333 0.66666667 0.75
0.58333333 0.5 0.61666667 0.5 ]
mean value: 0.6033333333333334
key: train_roc_auc
value: [0.70588235 0.64705882 0.70588235 0.65686275 0.6372549 0.61764706
0.6372549 0.65384615 0.625 0.67307692]
mean value: 0.6559766214177979
key: test_jcc
value: [0.55555556 0.5 0.6 0.54545455 0.55555556 0.625
0.5 0.54545455 0.55555556 0.54545455]
mean value: 0.5528030303030302
key: train_jcc
value: [0.625 0.5862069 0.61038961 0.59302326 0.58426966 0.57142857
0.58426966 0.5862069 0.56666667 0.6 ]
mean value: 0.5907461223244946
MCC on Blind test: 0.27
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00996208 0.00888777 0.00997472 0.00940871 0.00896931 0.00880861
0.00899887 0.00877929 0.00869918 0.0094018 ]
mean value: 0.009189033508300781
key: score_time
value: [0.00949287 0.00865507 0.00963235 0.00957608 0.00890374 0.00917196
0.00849485 0.0089581 0.0091083 0.00884199]
mean value: 0.00908353328704834
key: test_mcc
value: [ 0. -0.19245009 0.19245009 0.35355339 0.1 0.1
-0.1 -0.06900656 -0.04303315 0.26666667]
mean value: 0.06081803530345115
key: train_mcc
value: [0.47809144 0.49362406 0.41692608 0.44177063 0.44167123 0.49697785
0.45999986 0.55337612 0.46410101 0.42167602]
mean value: 0.4668214305743355
key: test_accuracy
value: [0.5 0.41666667 0.58333333 0.66666667 0.54545455 0.54545455
0.45454545 0.45454545 0.45454545 0.63636364]
mean value: 0.5257575757575758
key: train_accuracy
value: [0.73529412 0.74509804 0.70588235 0.71568627 0.7184466 0.74757282
0.72815534 0.77669903 0.72815534 0.70873786]
mean value: 0.7309727774604988
key: test_fscore
value: [0.4 0.22222222 0.44444444 0.6 0.54545455 0.54545455
0.4 0.4 0.25 0.66666667]
mean value: 0.4474242424242424
key: train_fscore
value: [0.70967742 0.72916667 0.68085106 0.68131868 0.70103093 0.74
0.71428571 0.77227723 0.69565217 0.68085106]
mean value: 0.7105110938756343
key: test_precision
value: [0.5 0.33333333 0.66666667 0.75 0.5 0.5
0.4 0.5 0.5 0.66666667]
mean value: 0.5316666666666666
key: train_precision
value: [0.78571429 0.77777778 0.74418605 0.775 0.75555556 0.77083333
0.76086957 0.78 0.7804878 0.74418605]
mean value: 0.7674610415499649
key: test_recall
value: [0.33333333 0.16666667 0.33333333 0.5 0.6 0.6
0.4 0.33333333 0.16666667 0.66666667]
mean value: 0.41
key: train_recall
value: [0.64705882 0.68627451 0.62745098 0.60784314 0.65384615 0.71153846
0.67307692 0.76470588 0.62745098 0.62745098]
mean value: 0.6626696832579185
key: test_roc_auc
value: [0.5 0.41666667 0.58333333 0.66666667 0.55 0.55
0.45 0.46666667 0.48333333 0.63333333]
mean value: 0.53
key: train_roc_auc
value: [0.73529412 0.74509804 0.70588235 0.71568627 0.71907994 0.74792609
0.72869532 0.77658371 0.72718703 0.70795626]
mean value: 0.7309389140271493
key: test_jcc
value: [0.25 0.125 0.28571429 0.42857143 0.375 0.375
0.25 0.25 0.14285714 0.5 ]
mean value: 0.2982142857142857
key: train_jcc
value: [0.55 0.57377049 0.51612903 0.51666667 0.53968254 0.58730159
0.55555556 0.62903226 0.53333333 0.51612903]
mean value: 0.5517600496923607
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.0098145 0.00907564 0.00900006 0.00922799 0.00901246 0.00870895
0.00914049 0.00910187 0.00882554 0.0091033 ]
mean value: 0.009101080894470214
key: score_time
value: [0.01085329 0.01000524 0.00981808 0.00992918 0.00976562 0.00992513
0.00990963 0.01000428 0.00987315 0.00958896]
mean value: 0.00996725559234619
key: test_mcc
value: [ 0. -0.70710678 -0.50709255 0.33333333 0.3105295 -0.46666667
0.06900656 0.1 0.3105295 -0.1490712 ]
mean value: -0.07065383065146226
key: train_mcc
value: [0.30261377 0.45663332 0.34296227 0.39223227 0.35084901 0.43681633
0.32422165 0.44600762 0.40045707 0.39833814]
mean value: 0.38511314468719693
key: test_accuracy
value: [0.5 0.16666667 0.25 0.66666667 0.63636364 0.27272727
0.54545455 0.54545455 0.63636364 0.45454545]
mean value: 0.4674242424242424
key: train_accuracy
value: [0.64705882 0.7254902 0.66666667 0.69607843 0.66990291 0.7184466
0.66019417 0.7184466 0.69902913 0.69902913]
mean value: 0.6900342661336379
key: test_fscore
value: [0.5 0. 0.30769231 0.66666667 0.66666667 0.2
0.44444444 0.54545455 0.6 0.57142857]
mean value: 0.45023532023532026
key: train_fscore
value: [0.6 0.70212766 0.62222222 0.69306931 0.63043478 0.72380952
0.63917526 0.68131868 0.71028037 0.68686869]
mean value: 0.6689306494896705
key: test_precision
value: [0.5 0. 0.28571429 0.66666667 0.57142857 0.2
0.5 0.6 0.75 0.5 ]
mean value: 0.4573809523809524
key: train_precision
value: [0.69230769 0.76744186 0.71794872 0.7 0.725 0.71698113
0.68888889 0.775 0.67857143 0.70833333]
mean value: 0.7170473053590649
key: test_recall
value: [0.5 0. 0.33333333 0.66666667 0.8 0.2
0.4 0.5 0.5 0.66666667]
mean value: 0.45666666666666667
key: train_recall
value: [0.52941176 0.64705882 0.54901961 0.68627451 0.55769231 0.73076923
0.59615385 0.60784314 0.74509804 0.66666667]
mean value: 0.6315987933634992
key: test_roc_auc
value: [0.5 0.16666667 0.25 0.66666667 0.65 0.26666667
0.53333333 0.55 0.65 0.43333333]
mean value: 0.4666666666666667
key: train_roc_auc
value: [0.64705882 0.7254902 0.66666667 0.69607843 0.67100302 0.71832579
0.66082202 0.71738311 0.6994721 0.69871795]
mean value: 0.6901018099547511
key: test_jcc
value: [0.33333333 0. 0.18181818 0.5 0.5 0.11111111
0.28571429 0.375 0.42857143 0.4 ]
mean value: 0.31155483405483403
key: train_jcc
value: [0.42857143 0.54098361 0.4516129 0.53030303 0.46031746 0.56716418
0.46969697 0.51666667 0.55072464 0.52307692]
mean value: 0.50391178052013
MCC on Blind test: 0.18
Accuracy on Blind test: 0.59
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01031351 0.01002502 0.00932026 0.00934005 0.01001763 0.00933385
0.00924444 0.00923753 0.00987935 0.00954986]
mean value: 0.009626150131225586
key: score_time
value: [0.01006293 0.00948262 0.00882435 0.00858355 0.00855827 0.00859284
0.00859046 0.00890946 0.00868011 0.00940084]
mean value: 0.008968544006347657
key: test_mcc
value: [ 0.35355339 -0.16903085 0. 0.33333333 0.3105295 0.06900656
0.46666667 0.26666667 0.1490712 0.1 ]
mean value: 0.1879796462452518
key: train_mcc
value: [0.67303645 0.7856742 0.60972137 0.65158377 0.61317623 0.72878164
0.74896235 0.76763491 0.70975239 0.70878919]
mean value: 0.6997112502455441
key: test_accuracy
value: [0.66666667 0.41666667 0.5 0.66666667 0.63636364 0.54545455
0.72727273 0.63636364 0.54545455 0.54545455]
mean value: 0.5886363636363636
key: train_accuracy
value: [0.83333333 0.89215686 0.80392157 0.82352941 0.80582524 0.86407767
0.87378641 0.88349515 0.85436893 0.85436893]
mean value: 0.8488863506567675
key: test_fscore
value: [0.6 0.46153846 0.5 0.66666667 0.66666667 0.44444444
0.72727273 0.66666667 0.44444444 0.54545455]
mean value: 0.5723154623154623
key: train_fscore
value: [0.82105263 0.88888889 0.79591837 0.8125 0.81481481 0.8627451
0.87128713 0.88461538 0.84848485 0.85148515]
mean value: 0.8451792310996761
key: test_precision
value: [0.75 0.42857143 0.5 0.66666667 0.57142857 0.5
0.66666667 0.66666667 0.66666667 0.6 ]
mean value: 0.6016666666666667
key: train_precision
value: [0.88636364 0.91666667 0.82978723 0.86666667 0.78571429 0.88
0.89795918 0.86792453 0.875 0.86 ]
mean value: 0.8666082201429165
key: test_recall
value: [0.5 0.5 0.5 0.66666667 0.8 0.4
0.8 0.66666667 0.33333333 0.5 ]
mean value: 0.5666666666666667
key: train_recall
value: [0.76470588 0.8627451 0.76470588 0.76470588 0.84615385 0.84615385
0.84615385 0.90196078 0.82352941 0.84313725]
mean value: 0.8263951734539969
key: test_roc_auc
value: [0.66666667 0.41666667 0.5 0.66666667 0.65 0.53333333
0.73333333 0.63333333 0.56666667 0.55 ]
mean value: 0.5916666666666667
key: train_roc_auc
value: [0.83333333 0.89215686 0.80392157 0.82352941 0.80542986 0.86425339
0.87405732 0.8836727 0.8540724 0.85426094]
mean value: 0.848868778280543
key: test_jcc
value: [0.42857143 0.3 0.33333333 0.5 0.5 0.28571429
0.57142857 0.5 0.28571429 0.375 ]
mean value: 0.4079761904761905
key: train_jcc
value: [0.69642857 0.8 0.66101695 0.68421053 0.6875 0.75862069
0.77192982 0.79310345 0.73684211 0.74137931]
mean value: 0.7331031424997326
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.52144575 0.47171903 0.82721043 0.55670023 0.54132795 0.57578349
0.56852198 0.55415893 0.50108457 0.61252236]
mean value: 0.5730474710464477
key: score_time
value: [0.0121913 0.0121181 0.01214528 0.01250863 0.01261091 0.01247096
0.01241922 0.01233172 0.01238108 0.01257896]
mean value: 0.012375617027282714
key: test_mcc
value: [0.33333333 0. 0. 0.66666667 0.83333333 0.2608746
0.1490712 0.44854261 0.1490712 0.46666667]
mean value: 0.3307559607947478
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.5 0.5 0.83333333 0.90909091 0.63636364
0.54545455 0.72727273 0.54545455 0.72727273]
mean value: 0.6590909090909091
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.57142857 0.57142857 0.83333333 0.90909091 0.5
0.61538462 0.76923077 0.44444444 0.72727273]
mean value: 0.6608280608280608
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.5 0.5 0.83333333 0.83333333 0.66666667
0.5 0.71428571 0.66666667 0.8 ]
mean value: 0.6680952380952381
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.66666667 0.83333333 1. 0.4
0.8 0.83333333 0.33333333 0.66666667]
mean value: 0.6866666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.5 0.5 0.83333333 0.91666667 0.61666667
0.56666667 0.71666667 0.56666667 0.73333333]
mean value: 0.6616666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.4 0.4 0.71428571 0.83333333 0.33333333
0.44444444 0.625 0.28571429 0.57142857]
mean value: 0.5107539682539682
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01543951 0.01430631 0.01153779 0.01077008 0.01102376 0.01163149
0.01030493 0.01114631 0.01093674 0.01104331]
mean value: 0.011814022064208984
key: score_time
value: [0.01190424 0.00897908 0.00885177 0.00861216 0.0084064 0.00839639
0.00842237 0.00837159 0.0084188 0.00844646]
mean value: 0.008880925178527833
key: test_mcc
value: [0.50709255 0.70710678 0.84515425 0.66666667 0.83333333 0.46666667
0.44854261 0.63333333 0.69006556 0.46666667]
mean value: 0.6264628428333725
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.83333333 0.91666667 0.83333333 0.90909091 0.72727273
0.72727273 0.81818182 0.81818182 0.72727273]
mean value: 0.806060606060606
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76923077 0.85714286 0.90909091 0.83333333 0.90909091 0.72727273
0.66666667 0.83333333 0.8 0.72727273]
mean value: 0.8032434232434232
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.75 1. 0.83333333 0.83333333 0.66666667
0.75 0.83333333 1. 0.8 ]
mean value: 0.8180952380952381
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 1. 0.83333333 0.83333333 1. 0.8
0.6 0.83333333 0.66666667 0.66666667]
mean value: 0.8066666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.83333333 0.91666667 0.83333333 0.91666667 0.73333333
0.71666667 0.81666667 0.83333333 0.73333333]
mean value: 0.8083333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.625 0.75 0.83333333 0.71428571 0.83333333 0.57142857
0.5 0.71428571 0.66666667 0.57142857]
mean value: 0.6779761904761905
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.65
Accuracy on Blind test: 0.81
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.0848 0.08455276 0.08480477 0.08505201 0.08489251 0.08458662
0.0842483 0.08415937 0.08451748 0.08450079]
mean value: 0.08461146354675293
key: score_time
value: [0.01697922 0.01714063 0.01709533 0.01729393 0.01706409 0.0169909
0.01697874 0.01725078 0.01703787 0.01676655]
mean value: 0.017059803009033203
key: test_mcc
value: [0.57735027 0. 0.19245009 0.66666667 0.55901699 0.44854261
0.44854261 0.06900656 0.1490712 0.44854261]
mean value: 0.3559189615112927
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.5 0.58333333 0.83333333 0.72727273 0.72727273
0.72727273 0.54545455 0.54545455 0.72727273]
mean value: 0.6666666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.57142857 0.66666667 0.83333333 0.76923077 0.66666667
0.66666667 0.61538462 0.44444444 0.76923077]
mean value: 0.6669719169719169
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.5 0.55555556 0.83333333 0.625 0.75
0.75 0.57142857 0.66666667 0.71428571]
mean value: 0.6966269841269841
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.66666667 0.83333333 0.83333333 1. 0.6
0.6 0.66666667 0.33333333 0.83333333]
mean value: 0.6866666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.5 0.58333333 0.83333333 0.75 0.71666667
0.71666667 0.53333333 0.56666667 0.71666667]
mean value: 0.6666666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.4 0.5 0.71428571 0.625 0.5
0.5 0.44444444 0.28571429 0.625 ]
mean value: 0.5094444444444445
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.65
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00962925 0.00877833 0.00890136 0.00889707 0.00978112 0.00891805
0.00888753 0.00978589 0.00883222 0.00868726]
mean value: 0.009109807014465333
key: score_time
value: [0.00888324 0.00857091 0.00868154 0.00859356 0.0087707 0.00869417
0.00870609 0.00908661 0.00851727 0.00869942]
mean value: 0.00872035026550293
key: test_mcc
value: [ 0.33333333 -0.16903085 0.16903085 0.33333333 0.26666667 0.51639778
0.1 -0.06900656 -0.2608746 -0.06900656]
mean value: 0.11508434035842094
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.41666667 0.58333333 0.66666667 0.63636364 0.72727273
0.54545455 0.45454545 0.36363636 0.45454545]
mean value: 0.5515151515151515
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.46153846 0.54545455 0.66666667 0.6 0.57142857
0.54545455 0.4 0.22222222 0.4 ]
mean value: 0.507943167943168
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.42857143 0.6 0.66666667 0.6 1.
0.5 0.5 0.33333333 0.5 ]
mean value: 0.5795238095238096
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.5 0.5 0.66666667 0.6 0.4
0.6 0.33333333 0.16666667 0.33333333]
mean value: 0.4766666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.41666667 0.58333333 0.66666667 0.63333333 0.7
0.55 0.46666667 0.38333333 0.46666667]
mean value: 0.5533333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.3 0.375 0.5 0.42857143 0.4
0.375 0.25 0.125 0.25 ]
mean value: 0.35035714285714287
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.54
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.07809305 1.08052611 1.08644509 1.07913208 1.0745616 1.07508063
1.06093812 1.06822157 1.06732559 1.06763673]
mean value: 1.0737960577011108
key: score_time
value: [0.09412026 0.09304404 0.09404373 0.09632349 0.09305692 0.08730507
0.08752012 0.09352827 0.09296465 0.09239817]
mean value: 0.0924304723739624
key: test_mcc
value: [0.50709255 0.33333333 0.57735027 0.66666667 0.83333333 0.46666667
0.44854261 0.63333333 0.43033148 0.44854261]
mean value: 0.5345192865417064
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.66666667 0.75 0.83333333 0.90909091 0.72727273
0.72727273 0.81818182 0.63636364 0.72727273]
mean value: 0.7545454545454545
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.72727273 0.66666667 0.8 0.83333333 0.90909091 0.72727273
0.66666667 0.83333333 0.5 0.76923077]
mean value: 0.7432867132867133
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.66666667 0.66666667 0.83333333 0.83333333 0.66666667
0.75 0.83333333 1. 0.71428571]
mean value: 0.7764285714285715
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 1. 0.83333333 1. 0.8
0.6 0.83333333 0.33333333 0.83333333]
mean value: 0.7566666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.66666667 0.75 0.83333333 0.91666667 0.73333333
0.71666667 0.81666667 0.66666667 0.71666667]
mean value: 0.7566666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.57142857 0.5 0.66666667 0.71428571 0.83333333 0.57142857
0.5 0.71428571 0.33333333 0.625 ]
mean value: 0.6029761904761904
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.56
Accuracy on Blind test: 0.78
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.83386564 0.82139683 0.89413834 0.88040566 0.8696816 0.87154555
0.89003706 0.92764449 0.85521364 0.85460138]
mean value: 0.8698530197143555
key: score_time
value: [0.18552732 0.22606993 0.197824 0.18685389 0.24319506 0.16305614
0.231493 0.23278856 0.19366717 0.18959689]
mean value: 0.20500719547271729
key: test_mcc
value: [0.50709255 0.50709255 0.57735027 0.50709255 0.63333333 0.46666667
0.63333333 0.83333333 0.1490712 0.44854261]
mean value: 0.5262908406440139
key: train_mcc
value: [0.92156863 0.92156863 0.92156863 0.96152395 0.94193062 0.9229904
0.94190878 0.92304797 0.92232278 0.88419471]
mean value: 0.9262625082295393
key: test_accuracy
value: [0.75 0.75 0.75 0.75 0.81818182 0.72727273
0.81818182 0.90909091 0.54545455 0.72727273]
mean value: 0.7545454545454545
key: train_accuracy
value: [0.96078431 0.96078431 0.96078431 0.98039216 0.97087379 0.96116505
0.97087379 0.96116505 0.96116505 0.94174757]
mean value: 0.9629735389301352
key: test_fscore
value: [0.72727273 0.76923077 0.8 0.72727273 0.8 0.72727273
0.8 0.90909091 0.44444444 0.76923077]
mean value: 0.7473815073815073
key: train_fscore
value: [0.96078431 0.96078431 0.96078431 0.98 0.97087379 0.96226415
0.97142857 0.96153846 0.96078431 0.94230769]
mean value: 0.963154991752785
key: test_precision
value: [0.8 0.71428571 0.66666667 0.8 0.8 0.66666667
0.8 1. 0.66666667 0.71428571]
mean value: 0.7628571428571429
key: train_precision
value: [0.96078431 0.96078431 0.96078431 1. 0.98039216 0.94444444
0.96226415 0.94339623 0.96078431 0.9245283 ]
mean value: 0.9598162535454433
key: test_recall
value: [0.66666667 0.83333333 1. 0.66666667 0.8 0.8
0.8 0.83333333 0.33333333 0.83333333]
mean value: 0.7566666666666667
key: train_recall
value: [0.96078431 0.96078431 0.96078431 0.96078431 0.96153846 0.98076923
0.98076923 0.98039216 0.96078431 0.96078431]
mean value: 0.9668174962292609
key: test_roc_auc
value: [0.75 0.75 0.75 0.75 0.81666667 0.73333333
0.81666667 0.91666667 0.56666667 0.71666667]
mean value: 0.7566666666666667
key: train_roc_auc
value: [0.96078431 0.96078431 0.96078431 0.98039216 0.97096531 0.96097285
0.97077677 0.96134992 0.96116139 0.94193062]
mean value: 0.9629901960784315
key: test_jcc
value: [0.57142857 0.625 0.66666667 0.57142857 0.66666667 0.57142857
0.66666667 0.83333333 0.28571429 0.625 ]
mean value: 0.6083333333333333
key: train_jcc
value: [0.9245283 0.9245283 0.9245283 0.96078431 0.94339623 0.92727273
0.94444444 0.92592593 0.9245283 0.89090909]
mean value: 0.9290845936239943
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0218575 0.00876045 0.0088129 0.00888467 0.00886083 0.00874114
0.00883603 0.0088737 0.00875211 0.0096333 ]
mean value: 0.010201263427734374
key: score_time
value: [0.01380324 0.00858402 0.00932169 0.00855565 0.0086031 0.00860167
0.00856328 0.00865817 0.00851178 0.00928473]
mean value: 0.009248733520507812
key: test_mcc
value: [ 0. -0.19245009 0.19245009 0.35355339 0.1 0.1
-0.1 -0.06900656 -0.04303315 0.26666667]
mean value: 0.06081803530345115
key: train_mcc
value: [0.47809144 0.49362406 0.41692608 0.44177063 0.44167123 0.49697785
0.45999986 0.55337612 0.46410101 0.42167602]
mean value: 0.4668214305743355
key: test_accuracy
value: [0.5 0.41666667 0.58333333 0.66666667 0.54545455 0.54545455
0.45454545 0.45454545 0.45454545 0.63636364]
mean value: 0.5257575757575758
key: train_accuracy
value: [0.73529412 0.74509804 0.70588235 0.71568627 0.7184466 0.74757282
0.72815534 0.77669903 0.72815534 0.70873786]
mean value: 0.7309727774604988
key: test_fscore
value: [0.4 0.22222222 0.44444444 0.6 0.54545455 0.54545455
0.4 0.4 0.25 0.66666667]
mean value: 0.4474242424242424
key: train_fscore
value: [0.70967742 0.72916667 0.68085106 0.68131868 0.70103093 0.74
0.71428571 0.77227723 0.69565217 0.68085106]
mean value: 0.7105110938756343
key: test_precision
value: [0.5 0.33333333 0.66666667 0.75 0.5 0.5
0.4 0.5 0.5 0.66666667]
mean value: 0.5316666666666666
key: train_precision
value: [0.78571429 0.77777778 0.74418605 0.775 0.75555556 0.77083333
0.76086957 0.78 0.7804878 0.74418605]
mean value: 0.7674610415499649
key: test_recall
value: [0.33333333 0.16666667 0.33333333 0.5 0.6 0.6
0.4 0.33333333 0.16666667 0.66666667]
mean value: 0.41
key: train_recall
value: [0.64705882 0.68627451 0.62745098 0.60784314 0.65384615 0.71153846
0.67307692 0.76470588 0.62745098 0.62745098]
mean value: 0.6626696832579185
key: test_roc_auc
value: [0.5 0.41666667 0.58333333 0.66666667 0.55 0.55
0.45 0.46666667 0.48333333 0.63333333]
mean value: 0.53
key: train_roc_auc
value: [0.73529412 0.74509804 0.70588235 0.71568627 0.71907994 0.74792609
0.72869532 0.77658371 0.72718703 0.70795626]
mean value: 0.7309389140271493
key: test_jcc
value: [0.25 0.125 0.28571429 0.42857143 0.375 0.375
0.25 0.25 0.14285714 0.5 ]
mean value: 0.2982142857142857
key: train_jcc
value: [0.55 0.57377049 0.51612903 0.51666667 0.53968254 0.58730159
0.55555556 0.62903226 0.53333333 0.51612903]
mean value: 0.5517600496923607
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07481241 0.22507048 0.04041743 0.04007125 0.04189968 0.04101229
0.04206061 0.04118204 0.04240179 0.04212117]
mean value: 0.06310491561889649
key: score_time
value: [0.01080179 0.01181221 0.01070094 0.01109576 0.01027584 0.01039696
0.01037216 0.01016545 0.01079583 0.01102448]
mean value: 0.010744142532348632
key: test_mcc
value: [0.70710678 0.70710678 1. 0.84515425 0.83333333 0.69006556
0.44854261 0.83333333 0.83333333 0.67082039]
mean value: 0.7568796383266433
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.83333333 1. 0.91666667 0.90909091 0.81818182
0.72727273 0.90909091 0.90909091 0.81818182]
mean value: 0.8674242424242424
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.85714286 1. 0.92307692 0.90909091 0.83333333
0.66666667 0.90909091 0.90909091 0.85714286]
mean value: 0.8721778221778221
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.75 1. 0.85714286 0.83333333 0.71428571
0.75 1. 1. 0.75 ]
mean value: 0.8404761904761905
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1.
0.6 0.83333333 0.83333333 1. ]
mean value: 0.9266666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.83333333 1. 0.91666667 0.91666667 0.83333333
0.71666667 0.91666667 0.91666667 0.8 ]
mean value: 0.8683333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.75 1. 0.85714286 0.83333333 0.71428571
0.5 0.83333333 0.83333333 0.75 ]
mean value: 0.7821428571428571
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.02431273 0.04793501 0.04520869 0.01918387 0.01961684 0.05043674
0.04669476 0.01964712 0.01939678 0.06448698]
mean value: 0.0356919527053833
key: score_time
value: [0.021065 0.02142334 0.02149987 0.01160932 0.01161528 0.02253747
0.02240157 0.01179361 0.01169348 0.02211642]
mean value: 0.017775535583496094
key: test_mcc
value: [ 0. 0.16903085 0.16903085 0.66666667 -0.1 0.55901699
-0.46666667 -0.44854261 0.3105295 0.43033148]
mean value: 0.12893970673098185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.5 0.58333333 0.58333333 0.83333333 0.45454545 0.72727273
0.27272727 0.27272727 0.63636364 0.63636364]
mean value: 0.55
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.57142857 0.61538462 0.61538462 0.83333333 0.4 0.76923077
0.2 0.2 0.6 0.5 ]
mean value: 0.5304761904761904
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.57142857 0.57142857 0.83333333 0.4 0.625
0.2 0.25 0.75 1. ]
mean value: 0.5701190476190476
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.66666667 0.83333333 0.4 1.
0.2 0.16666667 0.5 0.33333333]
mean value: 0.5433333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5 0.58333333 0.58333333 0.83333333 0.45 0.75
0.26666667 0.28333333 0.65 0.66666667]
mean value: 0.5566666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.4 0.44444444 0.44444444 0.71428571 0.25 0.625
0.11111111 0.11111111 0.42857143 0.33333333]
mean value: 0.3862301587301587
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01184154 0.01136208 0.00888276 0.00873661 0.00852537 0.00840068
0.00836301 0.0085299 0.00847936 0.00849724]
mean value: 0.009161853790283203
key: score_time
value: [0.01137066 0.00879383 0.00873899 0.00863862 0.00825405 0.00836372
0.00826645 0.00838542 0.00832129 0.00829244]
mean value: 0.008742547035217286
key: test_mcc
value: [ 0.16903085 0.16903085 0. 0.35355339 0.55901699 0.63333333
0.46666667 -0.1 -0.06900656 0.06900656]
mean value: 0.22506320868596277
key: train_mcc
value: [0.39223227 0.47067872 0.41464421 0.41176471 0.35941135 0.43702866
0.43702866 0.42093969 0.45625943 0.45639893]
mean value: 0.42563866339517525
key: test_accuracy
value: [0.58333333 0.58333333 0.5 0.66666667 0.72727273 0.81818182
0.72727273 0.45454545 0.45454545 0.54545455]
mean value: 0.6060606060606061
key: train_accuracy
value: [0.69607843 0.73529412 0.70588235 0.70588235 0.67961165 0.7184466
0.7184466 0.70873786 0.72815534 0.72815534]
mean value: 0.7124690652960214
key: test_fscore
value: [0.61538462 0.61538462 0.5 0.71428571 0.76923077 0.8
0.72727273 0.5 0.4 0.61538462]
mean value: 0.6256943056943057
key: train_fscore
value: [0.69306931 0.73267327 0.72222222 0.70588235 0.69158879 0.7184466
0.7184466 0.72222222 0.7254902 0.72 ]
mean value: 0.7150041556651703
key: test_precision
value: [0.57142857 0.57142857 0.5 0.625 0.625 0.8
0.66666667 0.5 0.5 0.57142857]
mean value: 0.5930952380952381
key: train_precision
value: [0.7 0.74 0.68421053 0.70588235 0.67272727 0.7254902
0.7254902 0.68421053 0.7254902 0.73469388]
mean value: 0.7098195144086342
key: test_recall
value: [0.66666667 0.66666667 0.5 0.83333333 1. 0.8
0.8 0.5 0.33333333 0.66666667]
mean value: 0.6766666666666666
key: train_recall
value: [0.68627451 0.7254902 0.76470588 0.70588235 0.71153846 0.71153846
0.71153846 0.76470588 0.7254902 0.70588235]
mean value: 0.7213046757164404
key: test_roc_auc
value: [0.58333333 0.58333333 0.5 0.66666667 0.75 0.81666667
0.73333333 0.45 0.46666667 0.53333333]
mean value: 0.6083333333333333
key: train_roc_auc
value: [0.69607843 0.73529412 0.70588235 0.70588235 0.67929864 0.71851433
0.71851433 0.70927602 0.72812971 0.72794118]
mean value: 0.7124811463046757
key: test_jcc
value: [0.44444444 0.44444444 0.33333333 0.55555556 0.625 0.66666667
0.57142857 0.33333333 0.25 0.44444444]
mean value: 0.46686507936507937
key: train_jcc
value: [0.53030303 0.578125 0.56521739 0.54545455 0.52857143 0.56060606
0.56060606 0.56521739 0.56923077 0.5625 ]
mean value: 0.556583167738059
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01127481 0.01294661 0.01390338 0.01304626 0.0125351 0.01292324
0.01466441 0.01396894 0.01350164 0.01370597]
mean value: 0.013247036933898925
key: score_time
value: [0.01279116 0.01113725 0.01108098 0.01119018 0.01119089 0.0111618
0.01162934 0.01145244 0.01148176 0.01136208]
mean value: 0.011447787284851074
key: test_mcc
value: [ 0.84515425 0.16903085 0.84515425 0.35355339 0.69006556 0.2608746
-0.26666667 0.43033148 0.46666667 0.3105295 ]
mean value: 0.41046938923293347
key: train_mcc
value: [0.85370578 0.85370578 0.88507941 0.56980288 0.72812971 0.71696975
0.90305552 0.78824078 0.7730823 0.83786936]
mean value: 0.7909641286354275
key: test_accuracy
value: [0.91666667 0.58333333 0.91666667 0.66666667 0.81818182 0.63636364
0.36363636 0.63636364 0.72727273 0.63636364]
mean value: 0.6901515151515152
key: train_accuracy
value: [0.92156863 0.92156863 0.94117647 0.74509804 0.86407767 0.84466019
0.95145631 0.88349515 0.87378641 0.91262136]
mean value: 0.8859508852084523
key: test_fscore
value: [0.90909091 0.61538462 0.92307692 0.6 0.83333333 0.5
0.36363636 0.5 0.72727273 0.6 ]
mean value: 0.6571794871794872
key: train_fscore
value: [0.91489362 0.91489362 0.94339623 0.65789474 0.86538462 0.82222222
0.95238095 0.86666667 0.88695652 0.90322581]
mean value: 0.8727914982144953
key: test_precision
value: [1. 0.57142857 0.85714286 0.75 0.71428571 0.66666667
0.33333333 1. 0.8 0.75 ]
mean value: 0.7442857142857143
key: train_precision
value: [1. 1. 0.90909091 1. 0.86538462 0.97368421
0.94339623 1. 0.796875 1. ]
mean value: 0.9488430961416935
key: test_recall
value: [0.83333333 0.66666667 1. 0.5 1. 0.4
0.4 0.33333333 0.66666667 0.5 ]
mean value: 0.63
key: train_recall
value: [0.84313725 0.84313725 0.98039216 0.49019608 0.86538462 0.71153846
0.96153846 0.76470588 1. 0.82352941]
mean value: 0.8283559577677224
key: test_roc_auc
value: [0.91666667 0.58333333 0.91666667 0.66666667 0.83333333 0.61666667
0.36666667 0.66666667 0.73333333 0.65 ]
mean value: 0.6950000000000001
key: train_roc_auc
value: [0.92156863 0.92156863 0.94117647 0.74509804 0.86406486 0.84596531
0.95135747 0.88235294 0.875 0.91176471]
mean value: 0.8859917043740573
key: test_jcc
value: [0.83333333 0.44444444 0.85714286 0.42857143 0.71428571 0.33333333
0.22222222 0.33333333 0.57142857 0.42857143]
mean value: 0.5166666666666666
key: train_jcc
value: [0.84313725 0.84313725 0.89285714 0.49019608 0.76271186 0.69811321
0.90909091 0.76470588 0.796875 0.82352941]
mean value: 0.7824354006254942
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01307654 0.01308775 0.01293898 0.01283979 0.01317358 0.01239753
0.01288557 0.01263547 0.01240683 0.01329827]
mean value: 0.012874031066894531
key: score_time
value: [0.01016164 0.01159859 0.01144123 0.01127625 0.01130557 0.01118708
0.01118135 0.01122093 0.01121593 0.01126552]
mean value: 0.011185407638549805
key: test_mcc
value: [ 0.35355339 0. 0.30151134 0.70710678 0.46666667 -0.1490712
0. 0.55901699 0.55901699 0.1490712 ]
mean value: 0.29468721717741464
key: train_mcc
value: [0.58489765 0.86692145 0.39886202 0.60246408 0.92304797 0.74927733
0.57166195 0.67789495 0.76763491 0.69330532]
mean value: 0.6835967621723328
key: test_accuracy
value: [0.66666667 0.5 0.58333333 0.83333333 0.72727273 0.45454545
0.45454545 0.72727273 0.72727273 0.54545455]
mean value: 0.621969696969697
key: train_accuracy
value: [0.75490196 0.93137255 0.6372549 0.7745098 0.96116505 0.86407767
0.74757282 0.81553398 0.88349515 0.82524272]
mean value: 0.8195126594327051
key: test_fscore
value: [0.71428571 0.625 0.70588235 0.85714286 0.72727273 0.25
0.625 0.66666667 0.66666667 0.44444444]
mean value: 0.6282361429420252
key: train_fscore
value: [0.80314961 0.93457944 0.73381295 0.81300813 0.96078431 0.84782609
0.8 0.77108434 0.88461538 0.78571429]
mean value: 0.8334574533634218
key: test_precision
value: [0.625 0.5 0.54545455 0.75 0.66666667 0.33333333
0.45454545 1. 1. 0.66666667]
mean value: 0.6541666666666667
key: train_precision
value: [0.67105263 0.89285714 0.57954545 0.69444444 0.98 0.975
0.66666667 1. 0.86792453 1. ]
mean value: 0.8327490868394543
key: test_recall
value: [0.83333333 0.83333333 1. 1. 0.8 0.2
1. 0.5 0.5 0.33333333]
mean value: 0.7
key: train_recall
value: [1. 0.98039216 1. 0.98039216 0.94230769 0.75
1. 0.62745098 0.90196078 0.64705882]
mean value: 0.8829562594268476
key: test_roc_auc
value: [0.66666667 0.5 0.58333333 0.83333333 0.73333333 0.43333333
0.5 0.75 0.75 0.56666667]
mean value: 0.6316666666666667
key: train_roc_auc
value: [0.75490196 0.93137255 0.6372549 0.7745098 0.96134992 0.86519608
0.74509804 0.81372549 0.8836727 0.82352941]
mean value: 0.8190610859728507
key: test_jcc
value: [0.55555556 0.45454545 0.54545455 0.75 0.57142857 0.14285714
0.45454545 0.5 0.5 0.28571429]
mean value: 0.476010101010101
key: train_jcc
value: [0.67105263 0.87719298 0.57954545 0.68493151 0.9245283 0.73584906
0.66666667 0.62745098 0.79310345 0.64705882]
mean value: 0.7207379852784521
MCC on Blind test: 0.71
Accuracy on Blind test: 0.86
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.09563279 0.08601046 0.08657742 0.08628511 0.08625317 0.08617806
0.08666682 0.08725524 0.08604097 0.08653998]
mean value: 0.08734400272369384
key: score_time
value: [0.01442719 0.01441336 0.01466346 0.01440644 0.01426506 0.01482415
0.01447535 0.01438451 0.01432061 0.01447439]
mean value: 0.01446545124053955
key: test_mcc
value: [0.33333333 0.70710678 0.70710678 0.84515425 0.83333333 0.69006556
0.63333333 0.26666667 0.69006556 0.83333333]
mean value: 0.653949893578632
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.83333333 0.83333333 0.91666667 0.90909091 0.81818182
0.81818182 0.63636364 0.81818182 0.90909091]
mean value: 0.8159090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.85714286 0.8 0.92307692 0.90909091 0.83333333
0.8 0.66666667 0.8 0.90909091]
mean value: 0.8165068265068265
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.75 1. 0.85714286 0.83333333 0.71428571
0.8 0.66666667 1. 1. ]
mean value: 0.8288095238095238
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 1. 0.66666667 1. 1. 1.
0.8 0.66666667 0.66666667 0.83333333]
mean value: 0.83
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.83333333 0.83333333 0.91666667 0.91666667 0.83333333
0.81666667 0.63333333 0.83333333 0.91666667]
mean value: 0.8200000000000001
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.75 0.66666667 0.85714286 0.83333333 0.71428571
0.66666667 0.5 0.66666667 0.83333333]
mean value: 0.6988095238095238
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03288817 0.03888249 0.04622889 0.03841782 0.03640127 0.03987908
0.05036545 0.03022933 0.04997826 0.03656101]
mean value: 0.039983177185058595
key: score_time
value: [0.02168846 0.01711726 0.02786493 0.02710056 0.02722168 0.0274992
0.02268314 0.03377604 0.0342288 0.03278279]
mean value: 0.027196288108825684
key: test_mcc
value: [0.70710678 0.70710678 0.70710678 0.50709255 0.63333333 0.46666667
0.44854261 1. 0.83333333 0.63333333]
mean value: 0.6643622176635949
key: train_mcc
value: [0.96078431 0.98058068 0.96152395 0.96078431 1. 1.
0.98076205 0.98076923 0.98076923 1. ]
mean value: 0.9805973758711093
key: test_accuracy
value: [0.83333333 0.83333333 0.83333333 0.75 0.81818182 0.72727273
0.72727273 1. 0.90909091 0.81818182]
mean value: 0.8250000000000001
key: train_accuracy
value: [0.98039216 0.99019608 0.98039216 0.98039216 1. 1.
0.99029126 0.99029126 0.99029126 1. ]
mean value: 0.9902246335427375
key: test_fscore
value: [0.85714286 0.85714286 0.8 0.72727273 0.8 0.72727273
0.66666667 1. 0.90909091 0.83333333]
mean value: 0.8177922077922077
key: train_fscore
value: [0.98039216 0.99029126 0.98 0.98039216 1. 1.
0.99047619 0.99029126 0.99029126 1. ]
mean value: 0.9902134290609448
key: test_precision
value: [0.75 0.75 1. 0.8 0.8 0.66666667
0.75 1. 1. 0.83333333]
mean value: 0.835
key: train_precision
value: [0.98039216 0.98076923 1. 0.98039216 1. 1.
0.98113208 0.98076923 0.98076923 1. ]
mean value: 0.988422408150488
key: test_recall
value: [1. 1. 0.66666667 0.66666667 0.8 0.8
0.6 1. 0.83333333 0.83333333]
mean value: 0.8200000000000001
key: train_recall
value: [0.98039216 1. 0.96078431 0.98039216 1. 1.
1. 1. 1. 1. ]
mean value: 0.9921568627450981
key: test_roc_auc
value: [0.83333333 0.83333333 0.83333333 0.75 0.81666667 0.73333333
0.71666667 1. 0.91666667 0.81666667]
mean value: 0.8250000000000001
key: train_roc_auc
value: [0.98039216 0.99019608 0.98039216 0.98039216 1. 1.
0.99019608 0.99038462 0.99038462 1. ]
mean value: 0.990233785822021
key: test_jcc
value: [0.75 0.75 0.66666667 0.57142857 0.66666667 0.57142857
0.5 1. 0.83333333 0.71428571]
mean value: 0.7023809523809523
key: train_jcc
value: [0.96153846 0.98076923 0.96078431 0.96153846 1. 1.
0.98113208 0.98076923 0.98076923 1. ]
mean value: 0.9807301004581803
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.01395178 0.01651168 0.0164876 0.01868868 0.0394485 0.0395906
0.01672554 0.03937197 0.03952646 0.03938293]
mean value: 0.027968573570251464
key: score_time
value: [0.01182222 0.01168132 0.01170325 0.02135181 0.02071929 0.01184392
0.0117321 0.02081585 0.02087355 0.02112746]
mean value: 0.016367077827453613
key: test_mcc
value: [ 0.16903085 -0.50709255 0.16903085 0. 0.55901699 0.2608746
-0.1490712 -0.26666667 -0.2608746 0.3105295 ]
mean value: 0.028477777996665087
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.58333333 0.25 0.58333333 0.5 0.72727273 0.63636364
0.45454545 0.36363636 0.36363636 0.63636364]
mean value: 0.5098484848484849
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.54545455 0.18181818 0.61538462 0.5 0.76923077 0.5
0.25 0.36363636 0.22222222 0.6 ]
mean value: 0.45477466977466974
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.2 0.57142857 0.5 0.625 0.66666667
0.33333333 0.4 0.33333333 0.75 ]
mean value: 0.49797619047619046
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.16666667 0.66666667 0.5 1. 0.4
0.2 0.33333333 0.16666667 0.5 ]
mean value: 0.44333333333333336
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.58333333 0.25 0.58333333 0.5 0.75 0.61666667
0.43333333 0.36666667 0.38333333 0.65 ]
mean value: 0.5116666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.375 0.1 0.44444444 0.33333333 0.625 0.33333333
0.14285714 0.22222222 0.125 0.42857143]
mean value: 0.31297619047619046
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.54
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.23015356 0.21968126 0.21211481 0.2232914 0.22050714 0.21650219
0.21859217 0.22216439 0.21892643 0.21719503]
mean value: 0.21991283893585206
key: score_time
value: [0.00920582 0.00885582 0.0088439 0.00896955 0.00899076 0.00892496
0.00895596 0.00880098 0.00909376 0.0088861 ]
mean value: 0.008952760696411132
key: test_mcc
value: [0.70710678 0.84515425 1. 0.84515425 0.83333333 0.69006556
0.44854261 0.63333333 0.69006556 0.82807867]
mean value: 0.7520834360778311
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.91666667 1. 0.91666667 0.90909091 0.81818182
0.72727273 0.81818182 0.81818182 0.90909091]
mean value: 0.8666666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.92307692 1. 0.92307692 0.90909091 0.83333333
0.66666667 0.83333333 0.8 0.92307692]
mean value: 0.8668797868797868
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.85714286 1. 0.85714286 0.83333333 0.71428571
0.75 0.83333333 1. 0.85714286]
mean value: 0.8452380952380952
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1.
0.6 0.83333333 0.66666667 1. ]
mean value: 0.91
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.91666667 1. 0.91666667 0.91666667 0.83333333
0.71666667 0.81666667 0.83333333 0.9 ]
mean value: 0.8683333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.85714286 1. 0.85714286 0.83333333 0.71428571
0.5 0.71428571 0.66666667 0.85714286]
mean value: 0.775
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01520419 0.01495075 0.01517487 0.01506615 0.01498771 0.01498652
0.01938462 0.01636744 0.03074956 0.0168345 ]
mean value: 0.01737062931060791
key: score_time
value: [0.01191139 0.01167727 0.01162887 0.01161718 0.01165557 0.01290107
0.01460361 0.014503 0.02054787 0.01282907]
mean value: 0.013387489318847656
key: test_mcc
value: [ 0. 0.50709255 0.35355339 -0.16903085 0.06900656 0.1490712
0.46666667 0.26666667 -0.1 0.1 ]
mean value: 0.16430261802522353
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.5 0.75 0.66666667 0.41666667 0.54545455 0.54545455
0.72727273 0.63636364 0.45454545 0.54545455]
mean value: 0.5787878787878787
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.76923077 0.71428571 0.36363636 0.44444444 0.61538462
0.72727273 0.66666667 0.5 0.54545455]
mean value: 0.5846375846375846
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.71428571 0.625 0.4 0.5 0.5
0.66666667 0.66666667 0.5 0.6 ]
mean value: 0.5672619047619047
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.83333333 0.83333333 0.33333333 0.4 0.8
0.8 0.66666667 0.5 0.5 ]
mean value: 0.6166666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5 0.75 0.66666667 0.41666667 0.53333333 0.56666667
0.73333333 0.63333333 0.45 0.55 ]
mean value: 0.5800000000000001
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.625 0.55555556 0.22222222 0.28571429 0.44444444
0.57142857 0.5 0.33333333 0.375 ]
mean value: 0.4246031746031746
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.54
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03670239 0.0243206 0.03172588 0.0315814 0.03152347 0.03162026
0.0317049 0.03175974 0.03160596 0.03164554]
mean value: 0.031419014930725096
key: score_time
value: [0.01155329 0.01707244 0.01971054 0.02251983 0.02160668 0.0201211
0.01141977 0.02167892 0.02119327 0.02242517]
mean value: 0.01893010139465332
key: test_mcc
value: [0.16903085 0. 0.84515425 0.66666667 0.69006556 0.26666667
0.26666667 0.55901699 0.69006556 0.26666667]
mean value: 0.44199998854005423
key: train_mcc
value: [0.94135745 1. 0.96152395 0.92227807 0.98076923 0.9229904
0.94193062 0.9229904 0.96116139 0.96187302]
mean value: 0.9516874525588295
key: test_accuracy
value: [0.58333333 0.5 0.91666667 0.83333333 0.81818182 0.63636364
0.63636364 0.72727273 0.81818182 0.63636364]
mean value: 0.7106060606060606
key: train_accuracy
value: [0.97058824 1. 0.98039216 0.96078431 0.99029126 0.96116505
0.97087379 0.96116505 0.98058252 0.98058252]
mean value: 0.975642490005711
key: test_fscore
value: [0.61538462 0.625 0.90909091 0.83333333 0.83333333 0.6
0.6 0.66666667 0.8 0.66666667]
mean value: 0.7149475524475524
key: train_fscore
value: [0.97029703 1. 0.98 0.96 0.99029126 0.96226415
0.97087379 0.96 0.98039216 0.98 ]
mean value: 0.97541183860528
key: test_precision
value: [0.57142857 0.5 1. 0.83333333 0.71428571 0.6
0.6 1. 1. 0.66666667]
mean value: 0.7485714285714286
key: train_precision
value: [0.98 1. 1. 0.97959184 1. 0.94444444
0.98039216 0.97959184 0.98039216 1. ]
mean value: 0.9844412431639322
key: test_recall
value: [0.66666667 0.83333333 0.83333333 0.83333333 1. 0.6
0.6 0.5 0.66666667 0.66666667]
mean value: 0.72
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.96078431 1. 0.96078431 0.94117647 0.98076923 0.98076923
0.96153846 0.94117647 0.98039216 0.96078431]
mean value: 0.9668174962292609
key: test_roc_auc
value: [0.58333333 0.5 0.91666667 0.83333333 0.83333333 0.63333333
0.63333333 0.75 0.83333333 0.63333333]
mean value: 0.715
key: train_roc_auc
value: [0.97058824 1. 0.98039216 0.96078431 0.99038462 0.96097285
0.97096531 0.96097285 0.98058069 0.98039216]
mean value: 0.975603318250377
key: test_jcc
value: [0.44444444 0.45454545 0.83333333 0.71428571 0.71428571 0.42857143
0.42857143 0.5 0.66666667 0.5 ]
mean value: 0.5684704184704185
key: train_jcc
value: [0.94230769 1. 0.96078431 0.92307692 0.98076923 0.92727273
0.94339623 0.92307692 0.96153846 0.96078431]
mean value: 0.9523006811908032
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25216579 0.17971325 0.18904495 0.18921661 0.27039409 0.28169656
0.18438935 0.18478012 0.20222712 0.11990452]
mean value: 0.2053532361984253
key: score_time
value: [0.02050495 0.02130389 0.01642084 0.01335454 0.02752233 0.02227807
0.01993704 0.0214355 0.02152538 0.01164627]
mean value: 0.019592881202697754
key: test_mcc
value: [0.16903085 0.35355339 0.50709255 0.50709255 0.63333333 0.26666667
0.3105295 0.43033148 0.69006556 0.3105295 ]
mean value: 0.4178225392875605
key: train_mcc
value: [0.94135745 1. 1. 0.60972137 1. 1.
1. 1. 1. 1. ]
mean value: 0.9551078823340995
key: test_accuracy
value: [0.58333333 0.66666667 0.75 0.75 0.81818182 0.63636364
0.63636364 0.63636364 0.81818182 0.63636364]
mean value: 0.6931818181818182
key: train_accuracy
value: [0.97058824 1. 1. 0.80392157 1. 1.
1. 1. 1. 1. ]
mean value: 0.9774509803921568
key: test_fscore
value: [0.61538462 0.71428571 0.72727273 0.72727273 0.8 0.6
0.66666667 0.5 0.8 0.6 ]
mean value: 0.675088245088245
key: train_fscore
value: [0.97029703 1. 1. 0.79591837 1. 1.
1. 1. 1. 1. ]
mean value: 0.9766215397049909
key: test_precision
value: [0.57142857 0.625 0.8 0.8 0.8 0.6
0.57142857 1. 1. 0.75 ]
mean value: 0.7517857142857143
key: train_precision
value: [0.98 1. 1. 0.82978723 1. 1.
1. 1. 1. 1. ]
mean value: 0.9809787234042553
key: test_recall
value: [0.66666667 0.83333333 0.66666667 0.66666667 0.8 0.6
0.8 0.33333333 0.66666667 0.5 ]
mean value: 0.6533333333333333
key: train_recall
value: [0.96078431 1. 1. 0.76470588 1. 1.
1. 1. 1. 1. ]
mean value: 0.9725490196078431
key: test_roc_auc
value: [0.58333333 0.66666667 0.75 0.75 0.81666667 0.63333333
0.65 0.66666667 0.83333333 0.65 ]
mean value: 0.7
key: train_roc_auc
value: [0.97058824 1. 1. 0.80392157 1. 1.
1. 1. 1. 1. ]
mean value: 0.9774509803921568
key: test_jcc
value: [0.44444444 0.55555556 0.57142857 0.57142857 0.66666667 0.42857143
0.5 0.33333333 0.66666667 0.42857143]
mean value: 0.5166666666666666
key: train_jcc
value: [0.94230769 1. 1. 0.66101695 1. 1.
1. 1. 1. 1. ]
mean value: 0.9603324641460235
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03044844 0.02831769 0.02996039 0.03043008 0.02798986 0.02752709
0.02288771 0.02915406 0.02952576 0.02516389]
mean value: 0.028140497207641602
key: score_time
value: [0.01249909 0.01163912 0.01143289 0.01147437 0.01142287 0.01163936
0.01132512 0.0114665 0.01140189 0.01849389]
mean value: 0.012279510498046875
key: test_mcc
value: [0.89893315 0.68543653 0.4472136 0.79772404 0.56980288 0.47140452
0.56980288 0.2236068 0.55555556 0.70710678]
mean value: 0.5926586727385535
key: train_mcc
value: [0.8039452 0.84201212 0.84202713 0.84202713 0.82951506 0.83025669
0.81762054 0.89084029 0.80511756 0.85391256]
mean value: 0.8357274278898956
key: test_accuracy
value: [0.94736842 0.84210526 0.72222222 0.88888889 0.77777778 0.72222222
0.77777778 0.61111111 0.77777778 0.83333333]
mean value: 0.7900584795321637
key: train_accuracy
value: [0.90184049 0.9202454 0.92073171 0.92073171 0.91463415 0.91463415
0.90853659 0.94512195 0.90243902 0.92682927]
mean value: 0.9175744426155918
key: test_fscore
value: [0.94117647 0.85714286 0.70588235 0.875 0.8 0.66666667
0.75 0.58823529 0.77777778 0.8 ]
mean value: 0.7761881419234361
key: train_fscore
value: [0.90123457 0.91719745 0.91925466 0.91925466 0.91358025 0.9125
0.9068323 0.94409938 0.90123457 0.92592593]
mean value: 0.9161113754660095
key: test_precision
value: [1. 0.81818182 0.75 1. 0.72727273 0.83333333
0.85714286 0.625 0.77777778 1. ]
mean value: 0.8388708513708514
key: train_precision
value: [0.9125 0.94736842 0.93670886 0.93670886 0.925 0.93589744
0.92405063 0.96202532 0.9125 0.9375 ]
mean value: 0.9330259527836143
key: test_recall
value: [0.88888889 0.9 0.66666667 0.77777778 0.88888889 0.55555556
0.66666667 0.55555556 0.77777778 0.66666667]
mean value: 0.7344444444444445
key: train_recall
value: [0.8902439 0.88888889 0.90243902 0.90243902 0.90243902 0.8902439
0.8902439 0.92682927 0.8902439 0.91463415]
mean value: 0.8998644986449864
key: test_roc_auc
value: [0.94444444 0.83888889 0.72222222 0.88888889 0.77777778 0.72222222
0.77777778 0.61111111 0.77777778 0.83333333]
mean value: 0.7894444444444444
key: train_roc_auc
value: [0.90191207 0.9200542 0.92073171 0.92073171 0.91463415 0.91463415
0.90853659 0.94512195 0.90243902 0.92682927]
mean value: 0.9175624811803673
key: test_jcc
value: [0.88888889 0.75 0.54545455 0.77777778 0.66666667 0.5
0.6 0.41666667 0.63636364 0.66666667]
mean value: 0.6448484848484848
key: train_jcc
value: [0.82022472 0.84705882 0.85057471 0.85057471 0.84090909 0.83908046
0.82954545 0.89411765 0.82022472 0.86206897]
mean value: 0.845437930481974
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.86987448 0.64905834 0.65774155 0.85697532 0.66762519 0.64877272
0.83129621 0.69926929 0.71107221 0.85802889]
mean value: 0.7449714183807373
key: score_time
value: [0.0130856 0.01300454 0.01301813 0.01331544 0.01296258 0.01906753
0.01196313 0.01293635 0.01503968 0.0136528 ]
mean value: 0.01380457878112793
key: test_mcc
value: [0.89893315 0.9 0.56980288 0.89442719 0.89442719 0.67082039
0.56980288 0.47140452 0.70710678 0.79772404]
mean value: 0.7374449026992183
key: train_mcc
value: [1. 1. 1. 1. 0.98787834 1.
1. 1. 1. 1. ]
mean value: 0.9987878339907214
key: test_accuracy
value: [0.94736842 0.94736842 0.77777778 0.94444444 0.94444444 0.83333333
0.77777778 0.72222222 0.83333333 0.88888889]
mean value: 0.8616959064327485
key: train_accuracy
value: [1. 1. 1. 1. 0.99390244 1.
1. 1. 1. 1. ]
mean value: 0.999390243902439
key: test_fscore
value: [0.94117647 0.94736842 0.75 0.94117647 0.94736842 0.82352941
0.75 0.66666667 0.8 0.875 ]
mean value: 0.8442285861713107
key: train_fscore
value: [1. 1. 1. 1. 0.99386503 1.
1. 1. 1. 1. ]
mean value: 0.9993865030674847
key: test_precision
value: [1. 1. 0.85714286 1. 0.9 0.875
0.85714286 0.83333333 1. 1. ]
mean value: 0.9322619047619047
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.77777778
0.66666667 0.55555556 0.66666667 0.77777778]
mean value: 0.7788888888888889
key: train_recall
value: [1. 1. 1. 1. 0.98780488 1.
1. 1. 1. 1. ]
mean value: 0.998780487804878
key: test_roc_auc
value: [0.94444444 0.95 0.77777778 0.94444444 0.94444444 0.83333333
0.77777778 0.72222222 0.83333333 0.88888889]
mean value: 0.8616666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 0.99390244 1.
1. 1. 1. 1. ]
mean value: 0.999390243902439
key: test_jcc
value: [0.88888889 0.9 0.6 0.88888889 0.9 0.7
0.6 0.5 0.66666667 0.77777778]
mean value: 0.7422222222222222
key: train_jcc
value: [1. 1. 1. 1. 0.98780488 1.
1. 1. 1. 1. ]
mean value: 0.998780487804878
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01251841 0.01035118 0.00904322 0.00870728 0.0086 0.00857806
0.00861526 0.00875258 0.00855613 0.00859094]
mean value: 0.009231305122375489
key: score_time
value: [0.01379108 0.00886035 0.00869012 0.00853968 0.0084424 0.00838447
0.00841713 0.00837684 0.00846767 0.00849962]
mean value: 0.00904693603515625
key: test_mcc
value: [0.28752732 0.01807754 0.3721042 0.24253563 0. 0.4472136
0.23570226 0. 0.55555556 0.53452248]
mean value: 0.26932385786240387
key: train_mcc
value: [0.39666608 0.40983565 0.45749571 0.48276756 0.49111711 0.48245064
0.45222959 0.44501237 0.42597138 0.41007219]
mean value: 0.44536182687198245
key: test_accuracy
value: [0.63157895 0.52631579 0.66666667 0.55555556 0.5 0.66666667
0.61111111 0.5 0.77777778 0.72222222]
mean value: 0.6157894736842106
key: train_accuracy
value: [0.6809816 0.65030675 0.70121951 0.70731707 0.70121951 0.7195122
0.68902439 0.70121951 0.68292683 0.65853659]
mean value: 0.6892263953314379
key: test_fscore
value: [0.66666667 0.66666667 0.72727273 0.69230769 0.64 0.75
0.66666667 0.60869565 0.77777778 0.7826087 ]
mean value: 0.6978662545184284
key: train_fscore
value: [0.73737374 0.73732719 0.75862069 0.76699029 0.76777251 0.76767677
0.75598086 0.75376884 0.74757282 0.74074074]
mean value: 0.7533824448496093
key: test_precision
value: [0.58333333 0.52941176 0.61538462 0.52941176 0.5 0.6
0.58333333 0.5 0.77777778 0.64285714]
mean value: 0.5861509732097967
key: train_precision
value: [0.62931034 0.58823529 0.63636364 0.63709677 0.62790698 0.65517241
0.62204724 0.64102564 0.62096774 0.59701493]
mean value: 0.6255140992468455
key: test_recall
value: [0.77777778 0.9 0.88888889 1. 0.88888889 1.
0.77777778 0.77777778 0.77777778 1. ]
mean value: 0.8788888888888888
key: train_recall
value: [0.8902439 0.98765432 0.93902439 0.96341463 0.98780488 0.92682927
0.96341463 0.91463415 0.93902439 0.97560976]
mean value: 0.9487654320987654
key: test_roc_auc
value: [0.63888889 0.50555556 0.66666667 0.55555556 0.5 0.66666667
0.61111111 0.5 0.77777778 0.72222222]
mean value: 0.6144444444444445
key: train_roc_auc
value: [0.67968985 0.65236375 0.70121951 0.70731707 0.70121951 0.7195122
0.68902439 0.70121951 0.68292683 0.65853659]
mean value: 0.6893029208069859
key: test_jcc
value: [0.5 0.5 0.57142857 0.52941176 0.47058824 0.6
0.5 0.4375 0.63636364 0.64285714]
mean value: 0.538814935064935
key: train_jcc
value: [0.584 0.58394161 0.61111111 0.62204724 0.62307692 0.62295082
0.60769231 0.60483871 0.59689922 0.58823529]
mean value: 0.6044793240087645
MCC on Blind test: 0.27
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00905609 0.00897646 0.00951314 0.00907636 0.00915027 0.00914383
0.00920677 0.00918961 0.00925541 0.00928283]
mean value: 0.009185075759887695
key: score_time
value: [0.00866008 0.008641 0.00868464 0.0087235 0.0087688 0.00884247
0.0088706 0.00899959 0.00887132 0.00893378]
mean value: 0.008799576759338379
key: test_mcc
value: [ 0.15555556 0.04494666 0.33333333 -0.11396058 0. 0.11396058
0.34188173 -0.2236068 0.33333333 0.4472136 ]
mean value: 0.14326574068486644
key: train_mcc
value: [0.44782413 0.41266129 0.42711521 0.45125307 0.48838629 0.43915503
0.49147319 0.48838629 0.47564513 0.40270863]
mean value: 0.452460824775168
key: test_accuracy
value: [0.57894737 0.52631579 0.66666667 0.44444444 0.5 0.55555556
0.66666667 0.38888889 0.66666667 0.72222222]
mean value: 0.5716374269005848
key: train_accuracy
value: [0.72392638 0.70552147 0.71341463 0.72560976 0.74390244 0.7195122
0.74390244 0.74390244 0.73780488 0.70121951]
mean value: 0.7258716145443663
key: test_fscore
value: [0.55555556 0.57142857 0.66666667 0.5 0.47058824 0.5
0.625 0.42105263 0.66666667 0.73684211]
mean value: 0.5713800432453683
key: train_fscore
value: [0.72727273 0.68831169 0.70807453 0.72727273 0.75 0.71604938
0.75862069 0.7375 0.73939394 0.70658683]
mean value: 0.72590825151311
key: test_precision
value: [0.55555556 0.54545455 0.66666667 0.45454545 0.5 0.57142857
0.71428571 0.4 0.66666667 0.7 ]
mean value: 0.5774603174603175
key: train_precision
value: [0.72289157 0.7260274 0.72151899 0.72289157 0.73255814 0.725
0.7173913 0.75641026 0.73493976 0.69411765]
mean value: 0.7253746623520101
key: test_recall
value: [0.55555556 0.6 0.66666667 0.55555556 0.44444444 0.44444444
0.55555556 0.44444444 0.66666667 0.77777778]
mean value: 0.5711111111111111
key: train_recall
value: [0.73170732 0.65432099 0.69512195 0.73170732 0.76829268 0.70731707
0.80487805 0.7195122 0.74390244 0.7195122 ]
mean value: 0.7276272207166516
key: test_roc_auc
value: [0.57777778 0.52222222 0.66666667 0.44444444 0.5 0.55555556
0.66666667 0.38888889 0.66666667 0.72222222]
mean value: 0.5711111111111111
key: train_roc_auc
value: [0.72387835 0.70520927 0.71341463 0.72560976 0.74390244 0.7195122
0.74390244 0.74390244 0.73780488 0.70121951]
mean value: 0.7258355916892503
key: test_jcc
value: [0.38461538 0.4 0.5 0.33333333 0.30769231 0.33333333
0.45454545 0.26666667 0.5 0.58333333]
mean value: 0.40635198135198136
key: train_jcc
value: [0.57142857 0.52475248 0.54807692 0.57142857 0.6 0.55769231
0.61111111 0.58415842 0.58653846 0.5462963 ]
mean value: 0.5701483133661351
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00906515 0.00898194 0.00869346 0.00964141 0.00977135 0.0086484
0.00956511 0.01074243 0.00970697 0.00954914]
mean value: 0.009436535835266113
key: score_time
value: [0.01515126 0.01482105 0.01515651 0.01516604 0.01514649 0.01451993
0.01451945 0.01603603 0.01516533 0.01491308]
mean value: 0.015059518814086913
key: test_mcc
value: [ 0.15555556 0.16854997 0.11396058 0. 0.11111111 0.12403473
0.2236068 -0.23570226 0.2236068 0.33333333]
mean value: 0.1218056611769099
key: train_mcc
value: [0.43930081 0.42455778 0.40610963 0.40391344 0.44556639 0.47592838
0.44232587 0.50093211 0.43229648 0.40391344]
mean value: 0.4374844323355661
key: test_accuracy
value: [0.57894737 0.57894737 0.55555556 0.5 0.55555556 0.55555556
0.61111111 0.38888889 0.61111111 0.66666667]
mean value: 0.560233918128655
key: train_accuracy
value: [0.71779141 0.71165644 0.70121951 0.70121951 0.7195122 0.73780488
0.7195122 0.75 0.71341463 0.70121951]
mean value: 0.7173350291785127
key: test_fscore
value: [0.55555556 0.55555556 0.5 0.4 0.55555556 0.42857143
0.58823529 0.47619048 0.58823529 0.66666667]
mean value: 0.5314565826330533
key: train_fscore
value: [0.7012987 0.69677419 0.67973856 0.68789809 0.69333333 0.73291925
0.7012987 0.74213836 0.68874172 0.68789809]
mean value: 0.701203901120714
key: test_precision
value: [0.55555556 0.625 0.57142857 0.5 0.55555556 0.6
0.625 0.41666667 0.625 0.66666667]
mean value: 0.5740873015873016
key: train_precision
value: [0.75 0.72972973 0.73239437 0.72 0.76470588 0.74683544
0.75 0.76623377 0.75362319 0.72 ]
mean value: 0.7433522375957392
key: test_recall
value: [0.55555556 0.5 0.44444444 0.33333333 0.55555556 0.33333333
0.55555556 0.55555556 0.55555556 0.66666667]
mean value: 0.5055555555555555
key: train_recall
value: [0.65853659 0.66666667 0.63414634 0.65853659 0.63414634 0.7195122
0.65853659 0.7195122 0.63414634 0.65853659]
mean value: 0.6642276422764227
key: test_roc_auc
value: [0.57777778 0.58333333 0.55555556 0.5 0.55555556 0.55555556
0.61111111 0.38888889 0.61111111 0.66666667]
mean value: 0.5605555555555556
key: train_roc_auc
value: [0.71815718 0.71138211 0.70121951 0.70121951 0.7195122 0.73780488
0.7195122 0.75 0.71341463 0.70121951]
mean value: 0.7173441734417345
key: test_jcc
value: [0.38461538 0.38461538 0.33333333 0.25 0.38461538 0.27272727
0.41666667 0.3125 0.41666667 0.5 ]
mean value: 0.36557400932400935
key: train_jcc
value: [0.54 0.53465347 0.51485149 0.52427184 0.53061224 0.57843137
0.54 0.59 0.52525253 0.52427184]
mean value: 0.5402344782514942
MCC on Blind test: -0.08
Accuracy on Blind test: 0.49
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01260471 0.01172137 0.01098704 0.01196337 0.0121212 0.01210618
0.01222682 0.01223993 0.01208854 0.01191831]
mean value: 0.011997747421264648
key: score_time
value: [0.01024818 0.00978136 0.01010489 0.01006174 0.01007962 0.01016092
0.00994873 0.00993824 0.00985742 0.00976133]
mean value: 0.009994244575500489
key: test_mcc
value: [0.36803496 0.68888889 0.33333333 0.55555556 0.47140452 0.
0.55555556 0.11396058 0.55555556 0.47140452]
mean value: 0.4113693471913179
key: train_mcc
value: [0.70567864 0.70551039 0.78141806 0.68313005 0.75812978 0.75699875
0.75632256 0.74440079 0.69517365 0.73192505]
mean value: 0.7318687723980281
key: test_accuracy
value: [0.68421053 0.84210526 0.66666667 0.77777778 0.72222222 0.5
0.77777778 0.55555556 0.77777778 0.72222222]
mean value: 0.7026315789473684
key: train_accuracy
value: [0.85276074 0.85276074 0.8902439 0.84146341 0.87804878 0.87804878
0.87804878 0.87195122 0.84756098 0.86585366]
mean value: 0.865674098458776
key: test_fscore
value: [0.625 0.84210526 0.66666667 0.77777778 0.76190476 0.4
0.77777778 0.5 0.77777778 0.66666667]
mean value: 0.6795676691729323
key: train_fscore
value: [0.85542169 0.85185185 0.8875 0.84337349 0.88235294 0.875
0.87654321 0.8742515 0.84662577 0.86419753]
mean value: 0.8657117978369109
key: test_precision
value: [0.71428571 0.88888889 0.66666667 0.77777778 0.66666667 0.5
0.77777778 0.57142857 0.77777778 0.83333333]
mean value: 0.7174603174603175
key: train_precision
value: [0.8452381 0.85185185 0.91025641 0.83333333 0.85227273 0.8974359
0.8875 0.85882353 0.85185185 0.875 ]
mean value: 0.8663563696651932
key: test_recall
value: [0.55555556 0.8 0.66666667 0.77777778 0.88888889 0.33333333
0.77777778 0.44444444 0.77777778 0.55555556]
mean value: 0.6577777777777778
key: train_recall
value: [0.86585366 0.85185185 0.86585366 0.85365854 0.91463415 0.85365854
0.86585366 0.8902439 0.84146341 0.85365854]
mean value: 0.865672990063234
key: test_roc_auc
value: [0.67777778 0.84444444 0.66666667 0.77777778 0.72222222 0.5
0.77777778 0.55555556 0.77777778 0.72222222]
mean value: 0.7022222222222222
key: train_roc_auc
value: [0.85267992 0.85275519 0.8902439 0.84146341 0.87804878 0.87804878
0.87804878 0.87195122 0.84756098 0.86585366]
mean value: 0.8656654622101776
key: test_jcc
value: [0.45454545 0.72727273 0.5 0.63636364 0.61538462 0.25
0.63636364 0.33333333 0.63636364 0.5 ]
mean value: 0.528962703962704
key: train_jcc
value: [0.74736842 0.74193548 0.79775281 0.72916667 0.78947368 0.77777778
0.78021978 0.77659574 0.73404255 0.76086957]
mean value: 0.7635202485876846
MCC on Blind test: 0.05
Accuracy on Blind test: 0.57
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.66131997 0.66211009 0.79937983 0.73952007 0.7480371 0.93877983
0.74273109 0.72177291 0.8511014 0.69477081]
mean value: 0.7559523105621337
key: score_time
value: [0.0125308 0.0154655 0.01416278 0.01435876 0.01442623 0.0205605
0.02195811 0.02220106 0.0232017 0.01383901]
mean value: 0.017270445823669434
key: test_mcc
value: [0.71611487 0.80903983 0.56980288 0.89442719 0.70710678 0.56980288
0.4472136 0.2236068 0.70710678 0.70710678]
mean value: 0.6351328401401198
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.89473684 0.77777778 0.94444444 0.83333333 0.77777778
0.72222222 0.61111111 0.83333333 0.83333333]
mean value: 0.8070175438596492
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.88888889 0.75 0.94117647 0.85714286 0.75
0.73684211 0.63157895 0.8 0.8 ]
mean value: 0.7955629269251561
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.85714286 1. 0.75 0.85714286
0.7 0.6 1. 1. ]
mean value: 0.8764285714285714
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.8 0.66666667 0.88888889 1. 0.66666667
0.77777778 0.66666667 0.66666667 0.66666667]
mean value: 0.7466666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.9 0.77777778 0.94444444 0.83333333 0.77777778
0.72222222 0.61111111 0.83333333 0.83333333]
mean value: 0.8066666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.8 0.6 0.88888889 0.75 0.6
0.58333333 0.46153846 0.66666667 0.66666667]
mean value: 0.6683760683760683
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.34
Accuracy on Blind test: 0.7
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0178659 0.01578808 0.01413012 0.0122447 0.01217771 0.01225829
0.01304674 0.01155305 0.01350141 0.01376867]
mean value: 0.013633465766906739
key: score_time
value: [0.01277757 0.01019597 0.00928235 0.00891542 0.00881386 0.00882745
0.00914836 0.00933933 0.00935149 0.00943089]
mean value: 0.009608268737792969
key: test_mcc
value: [0.89893315 1. 0.89442719 0.89442719 0.89442719 0.77777778
0.77777778 0.47140452 0.79772404 1. ]
mean value: 0.840689883451479
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 1. 0.94444444 0.94444444 0.94444444 0.88888889
0.88888889 0.72222222 0.88888889 1. ]
mean value: 0.9169590643274853
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 1. 0.94736842 0.94117647 0.94736842 0.88888889
0.88888889 0.66666667 0.875 1. ]
mean value: 0.9096534227726178
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.9 1. 0.9 0.88888889
0.88888889 0.83333333 1. 1. ]
mean value: 0.9411111111111111
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.88888889 1. 0.88888889
0.88888889 0.55555556 0.77777778 1. ]
mean value: 0.8888888888888888
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 1. 0.94444444 0.94444444 0.94444444 0.88888889
0.88888889 0.72222222 0.88888889 1. ]
mean value: 0.9166666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 1. 0.9 0.88888889 0.9 0.8
0.8 0.5 0.77777778 1. ]
mean value: 0.8455555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09470534 0.09246397 0.09804273 0.09987426 0.09740973 0.09735131
0.09562063 0.0994997 0.09801269 0.10053802]
mean value: 0.09735183715820313
key: score_time
value: [0.017555 0.01803017 0.0188818 0.01894879 0.01877737 0.01888657
0.01826334 0.01904607 0.01873159 0.01754045]
mean value: 0.01846611499786377
key: test_mcc
value: [0.80507649 0.9 0.67082039 1. 0.47140452 0.79772404
0.4472136 0.4472136 1. 0.77777778]
mean value: 0.7317230403935541
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89473684 0.94736842 0.83333333 1. 0.72222222 0.88888889
0.72222222 0.72222222 1. 0.88888889]
mean value: 0.8619883040935672
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.875 0.94736842 0.82352941 1. 0.76190476 0.875
0.70588235 0.70588235 1. 0.88888889]
mean value: 0.8583456189493341
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.875 1. 0.66666667 1.
0.75 0.75 1. 0.88888889]
mean value: 0.8930555555555555
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.9 0.77777778 1. 0.88888889 0.77777778
0.66666667 0.66666667 1. 0.88888889]
mean value: 0.8344444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88888889 0.95 0.83333333 1. 0.72222222 0.88888889
0.72222222 0.72222222 1. 0.88888889]
mean value: 0.8616666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.77777778 0.9 0.7 1. 0.61538462 0.77777778
0.54545455 0.54545455 1. 0.8 ]
mean value: 0.7661849261849262
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01011419 0.01173449 0.01065707 0.01012039 0.00895905 0.01001811
0.01012397 0.01015162 0.01015472 0.00998425]
mean value: 0.010201787948608399
key: score_time
value: [0.0099287 0.01066327 0.00916076 0.00908279 0.00873351 0.00936961
0.00939894 0.00948119 0.00942492 0.00959873]
mean value: 0.009484243392944337
key: test_mcc
value: [ 0.26257545 0.59554321 0.4472136 0.62017367 0.34188173 0.47140452
0.2236068 -0.34188173 0.4472136 0.62017367]
mean value: 0.368790452109
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63157895 0.78947368 0.72222222 0.77777778 0.66666667 0.72222222
0.61111111 0.33333333 0.72222222 0.77777778]
mean value: 0.6754385964912281
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.53333333 0.77777778 0.70588235 0.71428571 0.7 0.66666667
0.63157895 0.25 0.73684211 0.71428571]
mean value: 0.6430652611921962
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.875 0.75 1. 0.63636364 0.83333333
0.6 0.28571429 0.7 1. ]
mean value: 0.7347077922077923
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.44444444 0.7 0.66666667 0.55555556 0.77777778 0.55555556
0.66666667 0.22222222 0.77777778 0.55555556]
mean value: 0.5922222222222222
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.62222222 0.79444444 0.72222222 0.77777778 0.66666667 0.72222222
0.61111111 0.33333333 0.72222222 0.77777778]
mean value: 0.675
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.36363636 0.63636364 0.54545455 0.55555556 0.53846154 0.5
0.46153846 0.14285714 0.58333333 0.55555556]
mean value: 0.4882756132756133
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.59
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.22927833 1.1960187 1.19522476 1.17464781 1.17565775 1.17176104
1.17601395 1.23562598 1.2031343 1.24709344]
mean value: 1.2004456043243408
key: score_time
value: [0.09242558 0.09341002 0.08967948 0.09073281 0.08897328 0.08774757
0.08941007 0.15643597 0.09522557 0.0925622 ]
mean value: 0.0976602554321289
key: test_mcc
value: [0.89893315 0.9 0.55555556 0.89442719 0.56980288 0.67082039
0.79772404 0.4472136 1. 0.89442719]
mean value: 0.7628903993771927
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 0.77777778 0.94444444 0.77777778 0.83333333
0.88888889 0.72222222 1. 0.94444444]
mean value: 0.8783625730994152
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.94736842 0.77777778 0.94117647 0.8 0.82352941
0.875 0.70588235 1. 0.94117647]
mean value: 0.8753087375300997
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.77777778 1. 0.72727273 0.875
1. 0.75 1. 1. ]
mean value: 0.9130050505050505
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778
0.77777778 0.66666667 1. 0.88888889]
mean value: 0.8455555555555555
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.95 0.77777778 0.94444444 0.77777778 0.83333333
0.88888889 0.72222222 1. 0.94444444]
mean value: 0.8783333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [0.88888889 0.9 0.63636364 0.88888889 0.66666667 0.7
0.77777778 0.54545455 1. 0.88888889]
mean value: 0.7892929292929293
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.59
Accuracy on Blind test: 0.81
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.8364284 0.92184711 0.85616851 0.87351251 0.90050054 0.85522032
0.92096114 0.90089631 0.93183279 0.90460157]
mean value: 0.8901969194412231
key: score_time
value: [0.25036001 0.22233701 0.20914292 0.19844246 0.23667526 0.23699355
0.20669532 0.17988372 0.18153071 0.18519092]
mean value: 0.21072518825531006
key: test_mcc
value: [0.78888889 0.78888889 0.33333333 0.89442719 0.67082039 0.67082039
0.77777778 0.4472136 0.67082039 0.79772404]
mean value: 0.6840714890356039
key: train_mcc
value: [0.93927103 0.95121218 0.92682927 0.96348628 0.97590007 0.93909422
0.96348628 0.96348628 0.92793395 0.95150257]
mean value: 0.9502202141599985
key: test_accuracy
value: [0.89473684 0.89473684 0.66666667 0.94444444 0.83333333 0.83333333
0.88888889 0.72222222 0.83333333 0.88888889]
mean value: 0.8400584795321637
key: train_accuracy
value: [0.96932515 0.97546012 0.96341463 0.98170732 0.98780488 0.9695122
0.98170732 0.98170732 0.96341463 0.97560976]
mean value: 0.9749663324854108
key: test_fscore
value: [0.88888889 0.9 0.66666667 0.94117647 0.84210526 0.82352941
0.88888889 0.70588235 0.84210526 0.875 ]
mean value: 0.8374243206054351
key: train_fscore
value: [0.97005988 0.97560976 0.96341463 0.98181818 0.98795181 0.96969697
0.98181818 0.98181818 0.96428571 0.97590361]
mean value: 0.97523769216074
key: test_precision
value: [0.88888889 0.9 0.66666667 1. 0.8 0.875
0.88888889 0.75 0.8 1. ]
mean value: 0.8569444444444444
key: train_precision
value: [0.95294118 0.96385542 0.96341463 0.97590361 0.97619048 0.96385542
0.97590361 0.97590361 0.94186047 0.96428571]
mean value: 0.9654114152956387
key: test_recall
value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.77777778
0.88888889 0.66666667 0.88888889 0.77777778]
mean value: 0.8233333333333333
key: train_recall
value: [0.98780488 0.98765432 0.96341463 0.98780488 1. 0.97560976
0.98780488 0.98780488 0.98780488 0.98780488]
mean value: 0.985350797952424
key: test_roc_auc
value: [0.89444444 0.89444444 0.66666667 0.94444444 0.83333333 0.83333333
0.88888889 0.72222222 0.83333333 0.88888889]
mean value: 0.84
key: train_roc_auc
value: [0.96921108 0.97553448 0.96341463 0.98170732 0.98780488 0.9695122
0.98170732 0.98170732 0.96341463 0.97560976]
mean value: 0.9749623607347184
key: test_jcc
value: [0.8 0.81818182 0.5 0.88888889 0.72727273 0.7
0.8 0.54545455 0.72727273 0.77777778]
mean value: 0.7284848484848485
key: train_jcc
value: [0.94186047 0.95238095 0.92941176 0.96428571 0.97619048 0.94117647
0.96428571 0.96428571 0.93103448 0.95294118]
mean value: 0.9517852931068177
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01126671 0.00995231 0.01127005 0.01492596 0.01139402 0.01019573
0.01002526 0.00952506 0.01013112 0.01003289]
mean value: 0.01087191104888916
key: score_time
value: [0.01013088 0.00956964 0.01013112 0.01141644 0.00992489 0.00955415
0.00914097 0.00919056 0.00934291 0.00942731]
mean value: 0.009782886505126953
key: test_mcc
value: [ 0.15555556 0.04494666 0.33333333 -0.11396058 0. 0.11396058
0.34188173 -0.2236068 0.33333333 0.4472136 ]
mean value: 0.14326574068486644
key: train_mcc
value: [0.44782413 0.41266129 0.42711521 0.45125307 0.48838629 0.43915503
0.49147319 0.48838629 0.47564513 0.40270863]
mean value: 0.452460824775168
key: test_accuracy
value: [0.57894737 0.52631579 0.66666667 0.44444444 0.5 0.55555556
0.66666667 0.38888889 0.66666667 0.72222222]
mean value: 0.5716374269005848
key: train_accuracy
value: [0.72392638 0.70552147 0.71341463 0.72560976 0.74390244 0.7195122
0.74390244 0.74390244 0.73780488 0.70121951]
mean value: 0.7258716145443663
key: test_fscore
value: [0.55555556 0.57142857 0.66666667 0.5 0.47058824 0.5
0.625 0.42105263 0.66666667 0.73684211]
mean value: 0.5713800432453683
key: train_fscore
value: [0.72727273 0.68831169 0.70807453 0.72727273 0.75 0.71604938
0.75862069 0.7375 0.73939394 0.70658683]
mean value: 0.72590825151311
key: test_precision
value: [0.55555556 0.54545455 0.66666667 0.45454545 0.5 0.57142857
0.71428571 0.4 0.66666667 0.7 ]
mean value: 0.5774603174603175
key: train_precision
value: [0.72289157 0.7260274 0.72151899 0.72289157 0.73255814 0.725
0.7173913 0.75641026 0.73493976 0.69411765]
mean value: 0.7253746623520101
key: test_recall
value: [0.55555556 0.6 0.66666667 0.55555556 0.44444444 0.44444444
0.55555556 0.44444444 0.66666667 0.77777778]
mean value: 0.5711111111111111
key: train_recall
value: [0.73170732 0.65432099 0.69512195 0.73170732 0.76829268 0.70731707
0.80487805 0.7195122 0.74390244 0.7195122 ]
mean value: 0.7276272207166516
key: test_roc_auc
value: [0.57777778 0.52222222 0.66666667 0.44444444 0.5 0.55555556
0.66666667 0.38888889 0.66666667 0.72222222]
mean value: 0.5711111111111111
key: train_roc_auc
value: [0.72387835 0.70520927 0.71341463 0.72560976 0.74390244 0.7195122
0.74390244 0.74390244 0.73780488 0.70121951]
mean value: 0.7258355916892503
key: test_jcc
value: [0.38461538 0.4 0.5 0.33333333 0.30769231 0.33333333
0.45454545 0.26666667 0.5 0.58333333]
mean value: 0.40635198135198136
key: train_jcc
value: [0.57142857 0.52475248 0.54807692 0.57142857 0.6 0.55769231
0.61111111 0.58415842 0.58653846 0.5462963 ]
mean value: 0.5701483133661351
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.12701035 0.04905677 0.05640721 0.05744123 0.0649879 0.05551696
0.09176779 0.0588882 0.04906058 0.05181313]
mean value: 0.06619501113891602
key: score_time
value: [0.01352024 0.01060653 0.01075244 0.01053977 0.0114274 0.01368093
0.01104403 0.01173997 0.01017952 0.01028299]
mean value: 0.011377382278442382
key: test_mcc
value: [1. 1. 0.89442719 0.89442719 1. 0.77777778
0.77777778 0.67082039 1. 1. ]
mean value: 0.9015230330805324
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.94444444 0.94444444 1. 0.88888889
0.88888889 0.83333333 1. 1. ]
mean value: 0.95
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.94736842 0.94117647 1. 0.88888889
0.88888889 0.82352941 1. 1. ]
mean value: 0.9489852081183351
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.9 1. 1. 0.88888889
0.88888889 0.875 1. 1. ]
mean value: 0.9552777777777778
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 1. 0.88888889
0.88888889 0.77777778 1. 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.94444444 0.94444444 1. 0.88888889
0.88888889 0.83333333 1. 1. ]
mean value: 0.95
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.9 0.88888889 1. 0.8
0.8 0.7 1. 1. ]
mean value: 0.9088888888888889
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.83
Accuracy on Blind test: 0.92
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.02910137 0.04690957 0.02439976 0.02427816 0.05466056 0.05417943
0.02336264 0.02361846 0.02964282 0.03530502]
mean value: 0.03454577922821045
key: score_time
value: [0.01998067 0.01176548 0.01173735 0.01181555 0.02185011 0.02338123
0.0118885 0.01190233 0.02033043 0.01176739]
mean value: 0.0156419038772583
key: test_mcc
value: [ 0.80903983 0.68888889 0.89442719 0.67082039 0.56980288 0.70710678
0.56980288 -0.12403473 0. 0.70710678]
mean value: 0.5492960900474898
key: train_mcc
value: [1. 0.98780488 0.97590007 0.98787834 0.98787834 0.98787834
1. 1. 0.98787834 0.97560976]
mean value: 0.9890828066723727
key: test_accuracy
value: [0.89473684 0.84210526 0.94444444 0.83333333 0.77777778 0.83333333
0.77777778 0.44444444 0.5 0.83333333]
mean value: 0.7681286549707602
key: train_accuracy
value: [1. 0.99386503 0.98780488 0.99390244 0.99390244 0.99390244
1. 1. 0.99390244 0.98780488]
mean value: 0.9945084542869969
key: test_fscore
value: [0.9 0.84210526 0.94117647 0.84210526 0.75 0.8
0.75 0.28571429 0.52631579 0.8 ]
mean value: 0.7437417072091995
key: train_fscore
value: [1. 0.99386503 0.98765432 0.99393939 0.99386503 0.99386503
1. 1. 0.99393939 0.98780488]
mean value: 0.9944933078939763
key: test_precision
value: [0.81818182 0.88888889 1. 0.8 0.85714286 1.
0.85714286 0.4 0.5 1. ]
mean value: 0.8121356421356422
key: train_precision
value: [1. 0.98780488 1. 0.98795181 1. 1.
1. 1. 0.98795181 0.98780488]
mean value: 0.9951513370555393
key: test_recall
value: [1. 0.8 0.88888889 0.88888889 0.66666667 0.66666667
0.66666667 0.22222222 0.55555556 0.66666667]
mean value: 0.7022222222222222
key: train_recall
value: [1. 1. 0.97560976 1. 0.98780488 0.98780488
1. 1. 1. 0.98780488]
mean value: 0.9939024390243902
key: test_roc_auc
value: [0.9 0.84444444 0.94444444 0.83333333 0.77777778 0.83333333
0.77777778 0.44444444 0.5 0.83333333]
mean value: 0.7688888888888888
key: train_roc_auc
value: [1. 0.99390244 0.98780488 0.99390244 0.99390244 0.99390244
1. 1. 0.99390244 0.98780488]
mean value: 0.9945121951219512
key: test_jcc
value: [0.81818182 0.72727273 0.88888889 0.72727273 0.6 0.66666667
0.6 0.16666667 0.35714286 0.66666667]
mean value: 0.6218759018759019
key: train_jcc
value: [1. 0.98780488 0.97560976 0.98795181 0.98780488 0.98780488
1. 1. 0.98795181 0.97590361]
mean value: 0.9890831619159565
MCC on Blind test: 0.27
Accuracy on Blind test: 0.62
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02055335 0.00936055 0.00876904 0.00855136 0.0086956 0.00913787
0.01055503 0.01051044 0.00956655 0.0091722 ]
mean value: 0.010487198829650879
key: score_time
value: [0.00954032 0.00869036 0.00835061 0.00829196 0.00838614 0.00836682
0.01073599 0.00942254 0.00935078 0.00847292]
mean value: 0.008960843086242676
key: test_mcc
value: [0.15555556 0.36666667 0.47140452 0.23570226 0.56980288 0.12403473
0.2236068 0.11111111 0.4472136 0.55555556]
mean value: 0.32606536802127717
key: train_mcc
value: [0.46648209 0.43801421 0.50303909 0.45533504 0.48170179 0.46563593
0.46396698 0.50858153 0.45287265 0.45287265]
mean value: 0.46885019391864274
key: test_accuracy
value: [0.57894737 0.68421053 0.72222222 0.61111111 0.77777778 0.55555556
0.61111111 0.55555556 0.72222222 0.77777778]
mean value: 0.6596491228070176
key: train_accuracy
value: [0.73006135 0.71779141 0.75 0.72560976 0.73780488 0.73170732
0.73170732 0.75 0.72560976 0.72560976]
mean value: 0.7325901541224001
key: test_fscore
value: [0.55555556 0.7 0.76190476 0.66666667 0.8 0.63636364
0.63157895 0.55555556 0.70588235 0.77777778]
mean value: 0.6791285254133551
key: train_fscore
value: [0.75280899 0.72941176 0.76300578 0.74285714 0.75706215 0.74418605
0.73809524 0.77094972 0.73684211 0.73684211]
mean value: 0.7472061039370119
key: test_precision
value: [0.55555556 0.7 0.66666667 0.58333333 0.72727273 0.53846154
0.6 0.55555556 0.75 0.77777778]
mean value: 0.6454623154623155
key: train_precision
value: [0.69791667 0.69662921 0.72527473 0.69892473 0.70526316 0.71111111
0.72093023 0.71134021 0.70786517 0.70786517]
mean value: 0.7083120381435539
key: test_recall
value: [0.55555556 0.7 0.88888889 0.77777778 0.88888889 0.77777778
0.66666667 0.55555556 0.66666667 0.77777778]
mean value: 0.7255555555555555
key: train_recall
value: [0.81707317 0.7654321 0.80487805 0.79268293 0.81707317 0.7804878
0.75609756 0.84146341 0.76829268 0.76829268]
mean value: 0.7911773562180067
key: test_roc_auc
value: [0.57777778 0.68333333 0.72222222 0.61111111 0.77777778 0.55555556
0.61111111 0.55555556 0.72222222 0.77777778]
mean value: 0.6594444444444445
key: train_roc_auc
value: [0.72952424 0.7180819 0.75 0.72560976 0.73780488 0.73170732
0.73170732 0.75 0.72560976 0.72560976]
mean value: 0.7325654923215899
key: test_jcc
value: [0.38461538 0.53846154 0.61538462 0.5 0.66666667 0.46666667
0.46153846 0.38461538 0.54545455 0.63636364]
mean value: 0.51997668997669
key: train_jcc
value: [0.6036036 0.57407407 0.61682243 0.59090909 0.60909091 0.59259259
0.58490566 0.62727273 0.58333333 0.58333333]
mean value: 0.5965937754493564
MCC on Blind test: 0.13
Accuracy on Blind test: 0.59
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01109362 0.01412797 0.01536965 0.01582646 0.01812601 0.01766753
0.01889634 0.0176363 0.0157218 0.02554536]
mean value: 0.0170011043548584
key: score_time
value: [0.00872016 0.01174641 0.01328182 0.0126729 0.01187205 0.01244974
0.01293111 0.01268005 0.01434588 0.04189372]
mean value: 0.015259385108947754
key: test_mcc
value: [0.59554321 0.4719399 0.3721042 0.70710678 0.67082039 0.67082039
0.79772404 0.12403473 0.53452248 0.62017367]
mean value: 0.5564789813598412
key: train_mcc
value: [0.77808895 0.84213003 0.72987004 0.78978629 0.7200823 0.97590007
0.84162541 0.77964295 0.78978629 0.89565496]
mean value: 0.8142567289846006
key: test_accuracy
value: [0.78947368 0.73684211 0.66666667 0.83333333 0.83333333 0.83333333
0.88888889 0.55555556 0.72222222 0.77777778]
mean value: 0.7637426900584795
key: train_accuracy
value: [0.87730061 0.9202454 0.84756098 0.88414634 0.84146341 0.98780488
0.91463415 0.87804878 0.88414634 0.94512195]
mean value: 0.8980472841538232
key: test_fscore
value: [0.8 0.76190476 0.57142857 0.8 0.82352941 0.82352941
0.9 0.63636364 0.61538462 0.71428571]
mean value: 0.7446426122896711
key: train_fscore
value: [0.89130435 0.92215569 0.82014388 0.86896552 0.8115942 0.98765432
0.92134831 0.89130435 0.86896552 0.94193548]
mean value: 0.8925371626013687
key: test_precision
value: [0.72727273 0.72727273 0.8 1. 0.875 0.875
0.81818182 0.53846154 1. 1. ]
mean value: 0.8361188811188811
key: train_precision
value: [0.80392157 0.89534884 1. 1. 1. 1.
0.85416667 0.80392157 1. 1. ]
mean value: 0.9357358641130871
key: test_recall
value: [0.88888889 0.8 0.44444444 0.66666667 0.77777778 0.77777778
1. 0.77777778 0.44444444 0.55555556]
mean value: 0.7133333333333334
key: train_recall
value: [1. 0.95061728 0.69512195 0.76829268 0.68292683 0.97560976
1. 1. 0.76829268 0.8902439 ]
mean value: 0.8731105088828666
key: test_roc_auc
value: [0.79444444 0.73333333 0.66666667 0.83333333 0.83333333 0.83333333
0.88888889 0.55555556 0.72222222 0.77777778]
mean value: 0.7638888888888888
key: train_roc_auc
value: [0.87654321 0.92043059 0.84756098 0.88414634 0.84146341 0.98780488
0.91463415 0.87804878 0.88414634 0.94512195]
mean value: 0.8979900632339657
key: test_jcc
value: [0.66666667 0.61538462 0.4 0.66666667 0.7 0.7
0.81818182 0.46666667 0.44444444 0.55555556]
mean value: 0.6033566433566433
key: train_jcc
value: [0.80392157 0.85555556 0.69512195 0.76829268 0.68292683 0.97560976
0.85416667 0.80392157 0.76829268 0.8902439 ]
mean value: 0.8098053164355173
MCC on Blind test: 0.16
Accuracy on Blind test: 0.57
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01493216 0.01496792 0.01336694 0.01421976 0.01367188 0.01381302
0.01411867 0.01407051 0.01452351 0.01461649]
mean value: 0.014230084419250489
key: score_time
value: [0.00984478 0.01131344 0.01132298 0.01149678 0.01126051 0.01136065
0.01129603 0.01129508 0.01159763 0.01174164]
mean value: 0.011252951622009278
key: test_mcc
value: [0.68888889 0.80903983 0.56980288 0.55555556 0.56980288 0.34188173
0.67082039 0. 0.79772404 0.62017367]
mean value: 0.5623689874789073
key: train_mcc
value: [0.87291501 0.8299473 0.86007808 0.91524688 0.82065181 0.91798509
0.97590007 0.85224163 0.93909422 0.86294893]
mean value: 0.8847009016543498
key: test_accuracy
value: [0.84210526 0.89473684 0.77777778 0.77777778 0.77777778 0.66666667
0.83333333 0.5 0.88888889 0.77777778]
mean value: 0.7736842105263158
key: train_accuracy
value: [0.93251534 0.90797546 0.92682927 0.95731707 0.90243902 0.95731707
0.98780488 0.92073171 0.9695122 0.92682927]
mean value: 0.9389271285350891
key: test_fscore
value: [0.84210526 0.88888889 0.75 0.77777778 0.75 0.625
0.82352941 0.57142857 0.875 0.71428571]
mean value: 0.7618015627303554
key: train_fscore
value: [0.93714286 0.89795918 0.92207792 0.95808383 0.89189189 0.95541401
0.98765432 0.92655367 0.96969697 0.92105263]
mean value: 0.9367527294440279
key: test_precision
value: [0.8 1. 0.85714286 0.77777778 0.85714286 0.71428571
0.875 0.5 1. 1. ]
mean value: 0.8381349206349207
key: train_precision
value: [0.88172043 1. 0.98611111 0.94117647 1. 1.
1. 0.86315789 0.96385542 1. ]
mean value: 0.9636021328230462
key: test_recall
value: [0.88888889 0.8 0.66666667 0.77777778 0.66666667 0.55555556
0.77777778 0.66666667 0.77777778 0.55555556]
mean value: 0.7133333333333334
key: train_recall
value: [1. 0.81481481 0.86585366 0.97560976 0.80487805 0.91463415
0.97560976 1. 0.97560976 0.85365854]
mean value: 0.91806684733514
key: test_roc_auc
value: [0.84444444 0.9 0.77777778 0.77777778 0.77777778 0.66666667
0.83333333 0.5 0.88888889 0.77777778]
mean value: 0.7744444444444444
key: train_roc_auc
value: [0.93209877 0.90740741 0.92682927 0.95731707 0.90243902 0.95731707
0.98780488 0.92073171 0.9695122 0.92682927]
mean value: 0.9388286660644384
key: test_jcc
value: [0.72727273 0.8 0.6 0.63636364 0.6 0.45454545
0.7 0.4 0.77777778 0.55555556]
mean value: 0.6251515151515151
key: train_jcc
value: [0.88172043 0.81481481 0.85542169 0.91954023 0.80487805 0.91463415
0.97560976 0.86315789 0.94117647 0.85365854]
mean value: 0.8824612014684342
MCC on Blind test: 0.48
Accuracy on Blind test: 0.76
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11141515 0.1019187 0.10138392 0.10054922 0.10579777 0.10397267
0.10807872 0.1059761 0.10747981 0.10852766]
mean value: 0.10550997257232667
key: score_time
value: [0.01460505 0.01543188 0.01478577 0.01445913 0.01539803 0.01554084
0.01605749 0.01555371 0.01592231 0.01543474]
mean value: 0.015318894386291504
key: test_mcc
value: [1. 0.9 0.89442719 1. 0.89442719 0.67082039
0.89442719 0.77777778 0.89442719 0.89442719]
mean value: 0.8820734126027294
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94736842 0.94444444 1. 0.94444444 0.83333333
0.94444444 0.88888889 0.94444444 0.94444444]
mean value: 0.9391812865497076
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94736842 0.94736842 1. 0.94736842 0.82352941
0.94736842 0.88888889 0.94117647 0.94117647]
mean value: 0.9384244926040592
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.9 1. 0.9 0.875
0.9 0.88888889 1. 1. ]
mean value: 0.946388888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.9 1. 1. 1. 0.77777778
1. 0.88888889 0.88888889 0.88888889]
mean value: 0.9344444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.95 0.94444444 1. 0.94444444 0.83333333
0.94444444 0.88888889 0.94444444 0.94444444]
mean value: 0.9394444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.9 0.9 1. 0.9 0.7
0.9 0.8 0.88888889 0.88888889]
mean value: 0.8877777777777778
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0429635 0.03572798 0.04963231 0.03834414 0.04578018 0.05369258
0.03833675 0.03687501 0.03249407 0.03297234]
mean value: 0.040681886672973636
key: score_time
value: [0.0182848 0.02209377 0.02278614 0.02101231 0.02734518 0.02648211
0.02281046 0.02151656 0.01736498 0.02486873]
mean value: 0.02245650291442871
key: test_mcc
value: [0.89893315 1. 0.89442719 0.89442719 1. 0.89442719
0.77777778 0.56980288 0.89442719 1. ]
mean value: 0.882422257402662
key: train_mcc
value: [0.98780488 0.98780305 0.98787834 0.97560976 0.98787834 0.98787834
0.98787834 1. 0.96348628 1. ]
mean value: 0.9866217328816356
key: test_accuracy
value: [0.94736842 1. 0.94444444 0.94444444 1. 0.94444444
0.88888889 0.77777778 0.94444444 1. ]
mean value: 0.9391812865497076
key: train_accuracy
value: [0.99386503 0.99386503 0.99390244 0.98780488 0.99390244 0.99390244
0.99390244 1. 0.98170732 1. ]
mean value: 0.9932852012569205
key: test_fscore
value: [0.94117647 1. 0.94117647 0.94117647 1. 0.94736842
0.88888889 0.75 0.94736842 1. ]
mean value: 0.9357155142758858
key: train_fscore
value: [0.99386503 0.99378882 0.99386503 0.98780488 0.99386503 0.99393939
0.99386503 1. 0.98181818 1. ]
mean value: 0.9932811396381519
key: test_precision
value: [1. 1. 1. 1. 1. 0.9
0.88888889 0.85714286 0.9 1. ]
mean value: 0.9546031746031746
key: train_precision
value: [1. 1. 1. 0.98780488 1. 0.98795181
1. 1. 0.97590361 1. ]
mean value: 0.9951660299735527
key: test_recall
value: [0.88888889 1. 0.88888889 0.88888889 1. 1.
0.88888889 0.66666667 1. 1. ]
mean value: 0.9222222222222222
key: train_recall
value: [0.98780488 0.98765432 0.98780488 0.98780488 0.98780488 1.
0.98780488 1. 0.98780488 1. ]
mean value: 0.9914483589280337
key: test_roc_auc
value: [0.94444444 1. 0.94444444 0.94444444 1. 0.94444444
0.88888889 0.77777778 0.94444444 1. ]
mean value: 0.9388888888888889
key: train_roc_auc
value: [0.99390244 0.99382716 0.99390244 0.98780488 0.99390244 0.99390244
0.99390244 1. 0.98170732 1. ]
mean value: 0.9932851550737729
key: test_jcc
value: [0.88888889 1. 0.88888889 0.88888889 1. 0.9
0.8 0.6 0.9 1. ]
mean value: 0.8866666666666667
key: train_jcc
value: [0.98780488 0.98765432 0.98780488 0.97590361 0.98780488 0.98795181
0.98780488 1. 0.96428571 1. ]
mean value: 0.9867014969155238
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.04006433 0.06962156 0.07541966 0.06054902 0.05782652 0.06331706
0.05930591 0.05185676 0.05739498 0.05739808]
mean value: 0.05927538871765137
key: score_time
value: [0.0228579 0.0328567 0.01235533 0.02372789 0.02360535 0.02234149
0.02110028 0.0226531 0.02321887 0.02362919]
mean value: 0.02283461093902588
key: test_mcc
value: [ 0.71611487 0.72456884 0.56980288 0.89442719 0.34188173 0.4472136
0.55555556 -0.11396058 0.77777778 0.70710678]
mean value: 0.5620488647586125
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.84210526 0.77777778 0.94444444 0.66666667 0.66666667
0.77777778 0.44444444 0.88888889 0.83333333]
mean value: 0.7684210526315789
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.82352941 0.75 0.94117647 0.7 0.5
0.77777778 0.5 0.88888889 0.8 ]
mean value: 0.7481372549019608
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.85714286 1. 0.63636364 1.
0.77777778 0.45454545 0.88888889 1. ]
mean value: 0.8614718614718615
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.7 0.66666667 0.88888889 0.77777778 0.33333333
0.77777778 0.55555556 0.88888889 0.66666667]
mean value: 0.6922222222222222
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.85 0.77777778 0.94444444 0.66666667 0.66666667
0.77777778 0.44444444 0.88888889 0.83333333]
mean value: 0.7683333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.7 0.6 0.88888889 0.53846154 0.33333333
0.63636364 0.33333333 0.8 0.66666667]
mean value: 0.6163714063714063
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.57
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.30276346 0.29489088 0.294976 0.30200529 0.29472876 0.29524827
0.29835391 0.29391432 0.29789948 0.30171204]
mean value: 0.2976492404937744
key: score_time
value: [0.00935555 0.00900364 0.00975752 0.00927353 0.00938869 0.00927615
0.00972366 0.00915432 0.0101068 0.00940204]
mean value: 0.009444189071655274
key: test_mcc
value: [1. 1. 0.89442719 0.89442719 0.89442719 0.89442719
0.77777778 0.77777778 1. 1. ]
mean value: 0.9133264319555219
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.94444444 0.94444444 0.94444444 0.94444444
0.88888889 0.88888889 1. 1. ]
mean value: 0.9555555555555555
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.94736842 0.94117647 0.94736842 0.94736842
0.88888889 0.88888889 1. 1. ]
mean value: 0.9561059511523908
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.9 1. 0.9 0.9
0.88888889 0.88888889 1. 1. ]
mean value: 0.9477777777777778
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.88888889 1. 1.
0.88888889 0.88888889 1. 1. ]
mean value: 0.9666666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.94444444 0.94444444 0.94444444 0.94444444
0.88888889 0.88888889 1. 1. ]
mean value: 0.9555555555555555
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.9 0.88888889 0.9 0.9
0.8 0.8 1. 1. ]
mean value: 0.9188888888888889
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01833081 0.02146482 0.01923108 0.01972938 0.0196569 0.01911902
0.01890945 0.01898503 0.01887536 0.01911259]
mean value: 0.019341444969177245
key: score_time
value: [0.01230168 0.01215196 0.01264477 0.01321173 0.01266956 0.01343489
0.01673889 0.01325631 0.01959014 0.01313806]
mean value: 0.013913798332214355
key: test_mcc
value: [0.72456884 0.89893315 0.79772404 0.79772404 0.53452248 0.70710678
0.53452248 0.35355339 0.53452248 0.79772404]
mean value: 0.6680901716167226
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.94736842 0.88888889 0.88888889 0.72222222 0.83333333
0.72222222 0.61111111 0.72222222 0.88888889]
mean value: 0.8067251461988304
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.95238095 0.9 0.9 0.7826087 0.85714286
0.7826087 0.72 0.7826087 0.9 ]
mean value: 0.8434492753623188
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.90909091 0.81818182 0.81818182 0.64285714 0.75
0.64285714 0.5625 0.64285714 0.81818182]
mean value: 0.7354707792207793
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85 0.94444444 0.88888889 0.88888889 0.72222222 0.83333333
0.72222222 0.61111111 0.72222222 0.88888889]
mean value: 0.8072222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.90909091 0.81818182 0.81818182 0.64285714 0.75
0.64285714 0.5625 0.64285714 0.81818182]
mean value: 0.7354707792207793
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.62
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.04197288 0.03873825 0.03924918 0.03602433 0.03344655 0.03352237
0.0336082 0.03359556 0.03350544 0.03394341]
mean value: 0.03576061725616455
key: score_time
value: [0.02062082 0.02285862 0.02000999 0.02112794 0.02082849 0.01994658
0.02333951 0.02297258 0.02233124 0.02153254]
mean value: 0.021556830406188963
key: test_mcc
value: [0.78888889 0.68543653 0.56980288 0.79772404 0.89442719 0.56980288
0.67082039 0.4472136 0.67082039 0.70710678]
mean value: 0.6802043569726659
key: train_mcc
value: [0.93871406 0.93872328 0.96348628 0.95150257 0.96348628 0.98787834
0.95150257 0.95121951 0.93909422 0.95150257]
mean value: 0.9537109689949451
key: test_accuracy
value: [0.89473684 0.84210526 0.77777778 0.88888889 0.94444444 0.77777778
0.83333333 0.72222222 0.83333333 0.83333333]
mean value: 0.8347953216374269
key: train_accuracy
value: [0.96932515 0.96932515 0.98170732 0.97560976 0.98170732 0.99390244
0.97560976 0.97560976 0.9695122 0.97560976]
mean value: 0.9767918599431393
key: test_fscore
value: [0.88888889 0.85714286 0.75 0.875 0.94736842 0.75
0.82352941 0.70588235 0.82352941 0.8 ]
mean value: 0.8221341343554966
key: train_fscore
value: [0.96969697 0.96932515 0.98159509 0.97530864 0.98181818 0.99386503
0.97530864 0.97560976 0.96969697 0.97530864]
mean value: 0.9767533079309227
key: test_precision
value: [0.88888889 0.81818182 0.85714286 1. 0.9 0.85714286
0.875 0.75 0.875 1. ]
mean value: 0.8821356421356421
key: train_precision
value: [0.96385542 0.96341463 0.98765432 0.9875 0.97590361 1.
0.9875 0.97560976 0.96385542 0.9875 ]
mean value: 0.9792793169062882
key: test_recall
value: [0.88888889 0.9 0.66666667 0.77777778 1. 0.66666667
0.77777778 0.66666667 0.77777778 0.66666667]
mean value: 0.7788888888888889
key: train_recall
value: [0.97560976 0.97530864 0.97560976 0.96341463 0.98780488 0.98780488
0.96341463 0.97560976 0.97560976 0.96341463]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9743601324902138
key: test_roc_auc
value: [0.89444444 0.83888889 0.77777778 0.88888889 0.94444444 0.77777778
0.83333333 0.72222222 0.83333333 0.83333333]
mean value: 0.8344444444444443
key: train_roc_auc
value: [0.96928636 0.96936164 0.98170732 0.97560976 0.98170732 0.99390244
0.97560976 0.97560976 0.9695122 0.97560976]
mean value: 0.9767916290274014
key: test_jcc
value: [0.8 0.75 0.6 0.77777778 0.9 0.6
0.7 0.54545455 0.7 0.66666667]
mean value: 0.7039898989898989
key: train_jcc
value: [0.94117647 0.94047619 0.96385542 0.95180723 0.96428571 0.98780488
0.95180723 0.95238095 0.94117647 0.95180723]
mean value: 0.9546577784801843
MCC on Blind test: 0.54
Accuracy on Blind test: 0.78
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.23004889 0.22601652 0.21089005 0.23468184 0.22533727 0.1308012
0.14139199 0.2118597 0.13306785 0.30921268]
mean value: 0.20533080101013185
key: score_time
value: [0.02214789 0.02047777 0.0222075 0.02512264 0.02310586 0.01235127
0.01761007 0.01181221 0.02391386 0.02258921]
mean value: 0.020133829116821288
key: test_mcc
value: [0.78888889 0.68888889 0.89442719 0.79772404 0.89442719 0.56980288
0.67082039 0.4472136 0.67082039 0.70710678]
mean value: 0.7130120240479644
key: train_mcc
value: [0.93871406 0.96326408 0.97590007 0.95150257 0.96348628 0.98787834
0.95150257 0.95121951 0.93909422 0.97560976]
mean value: 0.9598171466702564
key: test_accuracy
value: [0.89473684 0.84210526 0.94444444 0.88888889 0.94444444 0.77777778
0.83333333 0.72222222 0.83333333 0.83333333]
mean value: 0.8514619883040936
key: train_accuracy
value: [0.96932515 0.98159509 0.98780488 0.97560976 0.98170732 0.99390244
0.97560976 0.97560976 0.9695122 0.98780488]
mean value: 0.9798481221008529
key: test_fscore
value: [0.88888889 0.84210526 0.94117647 0.875 0.94736842 0.75
0.82352941 0.70588235 0.82352941 0.8 ]
mean value: 0.8397480220158239
key: train_fscore
value: [0.96969697 0.98159509 0.98765432 0.97530864 0.98181818 0.99386503
0.97530864 0.97560976 0.96969697 0.98780488]
mean value: 0.9798358482996121
key: test_precision
value: [0.88888889 0.88888889 1. 1. 0.9 0.85714286
0.875 0.75 0.875 1. ]
mean value: 0.9034920634920635
key: train_precision
value: [0.96385542 0.97560976 1. 0.9875 0.97590361 1.
0.9875 0.97560976 0.96385542 0.98780488]
mean value: 0.9817638848075227
key: test_recall
value: [0.88888889 0.8 0.88888889 0.77777778 1. 0.66666667
0.77777778 0.66666667 0.77777778 0.66666667]
mean value: 0.7911111111111111
key: train_recall
value: [0.97560976 0.98765432 0.97560976 0.96341463 0.98780488 0.98780488
0.96341463 0.97560976 0.97560976 0.98780488]
mean value: 0.9780337247816923
key: test_roc_auc
value: [0.89444444 0.84444444 0.94444444 0.88888889 0.94444444 0.77777778
0.83333333 0.72222222 0.83333333 0.83333333]
mean value: 0.8516666666666666
key: train_roc_auc
value: [0.96928636 0.98163204 0.98780488 0.97560976 0.98170732 0.99390244
0.97560976 0.97560976 0.9695122 0.98780488]
mean value: 0.9798479373682626
key: test_jcc
value: [0.8 0.72727273 0.88888889 0.77777778 0.9 0.6
0.7 0.54545455 0.7 0.66666667]
mean value: 0.7306060606060606
key: train_jcc
value: [0.94117647 0.96385542 0.97560976 0.95180723 0.96428571 0.98780488
0.95180723 0.95238095 0.94117647 0.97590361]
mean value: 0.9605807735965383
MCC on Blind test: 0.54
Accuracy on Blind test: 0.78