LSHTM_analysis/scripts/ml/log_pnca_rt.txt

18962 lines
913 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_rt.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 424
PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation
or_mychisq 102
log10_or_mychisq 102
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 166
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
Original Data
Counter({0: 120, 1: 119}) Data dim: (239, 173)
-------------------------------------------------------------
Successfully split data: REVERSE training
imputed values: training set
actual values: blind test set
Train data size: (239, 173)
Test data size: (185, 173)
y_train numbers: Counter({0: 120, 1: 119})
y_train ratio: 1.0084033613445378
y_test_numbers: Counter({1: 114, 0: 71})
y_test ratio: 0.6228070175438597
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 120, 1: 120})
(240, 173)
Simple Random UnderSampling
Counter({0: 119, 1: 119})
(238, 173)
Simple Combined Over and UnderSampling
Counter({0: 120, 1: 120})
(240, 173)
SMOTE_NC OverSampling
Counter({0: 120, 1: 120})
(240, 173)
#####################################################################
Running ML analysis: REVERSE training
Gene name: pncA
Drug name: pyrazinamide
Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_rt/
Sanity checks:
Total input features: 173
Training data size: (239, 173)
Test data size: (185, 173)
Target feature numbers (training data): Counter({0: 120, 1: 119})
Target features ratio (training data: 1.0084033613445378
Target feature numbers (test data): Counter({1: 114, 0: 71})
Target features ratio (test data): 0.6228070175438597
#####################################################################
================================================================
Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03394842 0.02895188 0.03221607 0.03293228 0.04543996 0.03000593
0.03111315 0.03325677 0.03014874 0.03099895]
mean value: 0.032901215553283694
key: score_time
value: [0.01245475 0.01199245 0.01205087 0.01204062 0.01212907 0.01196837
0.01194596 0.01205087 0.01186895 0.01187897]
mean value: 0.012038087844848633
key: test_mcc
value: [0.58536941 0.75261781 0.64168895 0.58536941 0.60246408 0.53033009
0.58536941 0.6761234 0.83333333 0.56490196]
mean value: 0.6357567832334812
key: train_mcc
value: [0.77022946 0.81454556 0.79593084 0.78693949 0.82418184 0.80641659
0.80556067 0.78889274 0.78777764 0.80642024]
mean value: 0.7986895083179898
key: test_accuracy
value: [0.79166667 0.875 0.79166667 0.79166667 0.79166667 0.75
0.79166667 0.83333333 0.91666667 0.7826087 ]
mean value: 0.8115942028985507
key: train_accuracy
value: [0.88372093 0.90697674 0.89767442 0.89302326 0.91162791 0.90232558
0.90232558 0.89302326 0.89302326 0.90277778]
mean value: 0.8986498708010335
key: test_fscore
value: [0.7826087 0.86956522 0.82758621 0.8 0.81481481 0.7
0.8 0.84615385 0.91666667 0.76190476]
mean value: 0.8119300209480119
key: train_fscore
value: [0.88789238 0.90825688 0.89908257 0.89497717 0.91324201 0.90497738
0.90410959 0.89686099 0.8959276 0.90497738]
mean value: 0.9010303932834448
key: test_precision
value: [0.81818182 0.90909091 0.70588235 0.76923077 0.73333333 0.875
0.76923077 0.78571429 0.91666667 0.8 ]
mean value: 0.8082330904389728
key: train_precision
value: [0.85344828 0.89189189 0.88288288 0.875 0.89285714 0.87719298
0.88392857 0.86206897 0.86842105 0.88495575]
mean value: 0.8772647517739908
key: test_recall
value: [0.75 0.83333333 1. 0.83333333 0.91666667 0.58333333
0.83333333 0.91666667 0.91666667 0.72727273]
mean value: 0.831060606060606
key: train_recall
value: [0.92523364 0.92523364 0.91588785 0.91588785 0.93457944 0.93457944
0.92523364 0.93457944 0.92523364 0.92592593]
mean value: 0.9262374524056767
key: test_roc_auc
value: [0.79166667 0.875 0.79166667 0.79166667 0.79166667 0.75
0.79166667 0.83333333 0.91666667 0.78030303]
mean value: 0.8113636363636364
key: train_roc_auc
value: [0.88391312 0.90706127 0.89775874 0.89312911 0.91173416 0.9024749
0.90243164 0.89321565 0.89317238 0.90277778]
mean value: 0.8987668743509866
key: test_jcc
value: [0.64285714 0.76923077 0.70588235 0.66666667 0.6875 0.53846154
0.66666667 0.73333333 0.84615385 0.61538462]
mean value: 0.6872136931695755
key: train_jcc
value: [0.7983871 0.83193277 0.81666667 0.80991736 0.84033613 0.82644628
0.825 0.81300813 0.81147541 0.82644628]
mean value: 0.8199616128276623
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.86901093 0.75784445 0.80631161 0.83643413 0.7593751 0.88770938
0.73867297 0.7577157 0.95151854 0.74281764]
mean value: 0.8107410430908203
key: score_time
value: [0.01209021 0.01205754 0.01216173 0.01274061 0.01207447 0.01208067
0.01207066 0.01211143 0.01211905 0.01203394]
mean value: 0.012154030799865722
key: test_mcc
value: [0.58536941 0.75261781 0.64168895 0.58536941 0.53033009 0.53033009
0.58536941 0.6761234 0.6761234 0.65151515]
mean value: 0.621483710880162
key: train_mcc
value: [0.73420542 0.7780095 0.78777764 0.7587014 0.74008668 0.7802162
0.76913868 0.76913868 0.75938024 0.75158034]
mean value: 0.7628234782601825
key: test_accuracy
value: [0.79166667 0.875 0.79166667 0.79166667 0.75 0.75
0.79166667 0.83333333 0.83333333 0.82608696]
mean value: 0.8034420289855072
key: train_accuracy
value: [0.86511628 0.88837209 0.89302326 0.87906977 0.86976744 0.88837209
0.88372093 0.88372093 0.87906977 0.875 ]
mean value: 0.8805232558139535
key: test_fscore
value: [0.7826087 0.86956522 0.82758621 0.8 0.78571429 0.7
0.8 0.84615385 0.84615385 0.81818182]
mean value: 0.8075963916143827
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.87111111 0.89090909 0.8959276 0.88073394 0.87155963 0.89285714
0.88687783 0.88687783 0.88181818 0.87892377]
mean value: 0.8837596129411874
key: test_precision
value: [0.81818182 0.90909091 0.70588235 0.76923077 0.6875 0.875
0.76923077 0.78571429 0.78571429 0.81818182]
mean value: 0.7923727008285832
key: train_precision
value: [0.83050847 0.86725664 0.86842105 0.86486486 0.85585586 0.85470085
0.85964912 0.85964912 0.85840708 0.85217391]
mean value: 0.8571486978101098
key: test_recall
value: [0.75 0.83333333 1. 0.83333333 0.91666667 0.58333333
0.83333333 0.91666667 0.91666667 0.81818182]
mean value: 0.8401515151515152
key: train_recall
value: [0.91588785 0.91588785 0.92523364 0.89719626 0.88785047 0.93457944
0.91588785 0.91588785 0.90654206 0.90740741]
mean value: 0.9122360678435445
key: test_roc_auc
value: [0.79166667 0.875 0.79166667 0.79166667 0.75 0.75
0.79166667 0.83333333 0.83333333 0.82575758]
mean value: 0.803409090909091
key: train_roc_auc
value: [0.86535133 0.88849948 0.89317238 0.87915369 0.86985116 0.88858602
0.88386985 0.88386985 0.87919695 0.875 ]
mean value: 0.8806550709588092
key: test_jcc
value: [0.64285714 0.76923077 0.70588235 0.66666667 0.64705882 0.53846154
0.66666667 0.73333333 0.73333333 0.69230769]
mean value: 0.679579831932773
key: train_jcc
value: [0.77165354 0.80327869 0.81147541 0.78688525 0.77235772 0.80645161
0.79674797 0.79674797 0.78861789 0.784 ]
mean value: 0.7918216045188055
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01307511 0.01253748 0.00938439 0.00916386 0.00911689 0.01020527
0.01011682 0.01027584 0.01017213 0.01063657]
mean value: 0.010468435287475587
key: score_time
value: [0.01179338 0.01028872 0.0090909 0.00874138 0.0089705 0.00970244
0.00955105 0.00958204 0.00964665 0.00971246]
mean value: 0.009707951545715332
key: test_mcc
value: [ 0.38490018 0.43033148 -0.2236068 0.35355339 0.64168895 0.60246408
0.1767767 0.70710678 0.60246408 0.58930667]
mean value: 0.426498549835703
key: train_mcc
value: [0.51291722 0.51210342 0.54948685 0.50693341 0.49306533 0.50903165
0.4861266 0.48500475 0.4889469 0.50251891]
mean value: 0.5046135029902341
key: test_accuracy
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.79166667
0.58333333 0.83333333 0.79166667 0.7826087 ]
mean value: 0.7032608695652174
key: train_accuracy
value: [0.73953488 0.72093023 0.7627907 0.74883721 0.73023256 0.73488372
0.73023256 0.7255814 0.7255814 0.73148148]
mean value: 0.7350086132644272
key: test_fscore
value: [0.73333333 0.74074074 0.5625 0.71428571 0.82758621 0.81481481
0.64285714 0.85714286 0.81481481 0.8 ]
mean value: 0.750807562488597
key: train_fscore
value: [0.77777778 0.7761194 0.79183673 0.76923077 0.76984127 0.77647059
0.76612903 0.76679842 0.76862745 0.7751938 ]
mean value: 0.7738025243424465
key: test_precision
value: [0.61111111 0.66666667 0.45 0.625 0.70588235 0.73333333
0.5625 0.75 0.73333333 0.71428571]
mean value: 0.6552112511671335
key: train_precision
value: [0.67586207 0.64596273 0.70289855 0.70866142 0.66896552 0.66891892
0.67375887 0.66438356 0.66216216 0.66666667]
mean value: 0.6738240461813434
key: test_recall
value: [0.91666667 0.83333333 0.75 0.83333333 1. 0.91666667
0.75 1. 0.91666667 0.90909091]
mean value: 0.8825757575757576
key: train_recall
value: [0.91588785 0.97196262 0.90654206 0.8411215 0.90654206 0.92523364
0.88785047 0.90654206 0.91588785 0.92592593]
mean value: 0.910349601938387
key: test_roc_auc
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.79166667
0.58333333 0.83333333 0.79166667 0.78787879]
mean value: 0.7037878787878789
key: train_roc_auc
value: [0.74035133 0.72209242 0.76345621 0.74926445 0.73104881 0.73576497
0.73096227 0.72641918 0.72646244 0.73148148]
mean value: 0.735730356524749
key: test_jcc
value: [0.57894737 0.58823529 0.39130435 0.55555556 0.70588235 0.6875
0.47368421 0.75 0.6875 0.66666667]
mean value: 0.6085275796054501
key: train_jcc
value: [0.63636364 0.63414634 0.65540541 0.625 0.62580645 0.63461538
0.62091503 0.62179487 0.62420382 0.63291139]
mean value: 0.6311162337996469
MCC on Blind test: 0.21
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01122594 0.01032996 0.0103538 0.01023841 0.01048398 0.01038909
0.01046562 0.0103941 0.00965381 0.00983214]
mean value: 0.010336685180664062
key: score_time
value: [0.00994563 0.00961161 0.00965428 0.00908828 0.00969028 0.00971985
0.00960493 0.00939488 0.00967574 0.00929523]
mean value: 0.009568071365356446
key: test_mcc
value: [0.58536941 0.35355339 0. 0.50709255 0.33333333 0.58536941
0.25819889 0.66666667 0.84515425 0.82575758]
mean value: 0.49604954776735644
key: train_mcc
value: [0.59116891 0.60929387 0.63724472 0.64842315 0.66084467 0.61888689
0.67521245 0.6335132 0.62092317 0.6049981 ]
mean value: 0.6300509127566093
key: test_accuracy
value: [0.79166667 0.66666667 0.5 0.75 0.66666667 0.79166667
0.625 0.83333333 0.91666667 0.91304348]
mean value: 0.7454710144927537
key: train_accuracy
value: [0.79534884 0.80465116 0.81860465 0.82325581 0.82790698 0.80930233
0.8372093 0.81395349 0.80930233 0.80092593]
mean value: 0.8140460809646857
key: test_fscore
value: [0.7826087 0.6 0.53846154 0.76923077 0.66666667 0.8
0.66666667 0.83333333 0.92307692 0.90909091]
mean value: 0.748913550217898
key: train_fscore
value: [0.79816514 0.80373832 0.81860465 0.82882883 0.83700441 0.81105991
0.84018265 0.8245614 0.8161435 0.81057269]
mean value: 0.8188861485376868
key: test_precision
value: [0.81818182 0.75 0.5 0.71428571 0.66666667 0.76923077
0.6 0.83333333 0.85714286 0.90909091]
mean value: 0.7417932067932068
key: train_precision
value: [0.78378378 0.80373832 0.81481481 0.8 0.79166667 0.8
0.82142857 0.7768595 0.78448276 0.77310924]
mean value: 0.7949883660901246
key: test_recall
value: [0.75 0.5 0.58333333 0.83333333 0.66666667 0.83333333
0.75 0.83333333 1. 0.90909091]
mean value: 0.7659090909090909
key: train_recall
value: [0.81308411 0.80373832 0.82242991 0.85981308 0.88785047 0.82242991
0.85981308 0.87850467 0.85046729 0.85185185]
mean value: 0.8449982692973347
key: test_roc_auc
value: [0.79166667 0.66666667 0.5 0.75 0.66666667 0.79166667
0.625 0.83333333 0.91666667 0.91287879]
mean value: 0.7454545454545455
key: train_roc_auc
value: [0.79543094 0.80464694 0.81862236 0.82342506 0.82818449 0.8093631
0.83731395 0.81425234 0.8094929 0.80092593]
mean value: 0.814165801315334
key: test_jcc
value: [0.64285714 0.42857143 0.36842105 0.625 0.5 0.66666667
0.5 0.71428571 0.85714286 0.83333333]
mean value: 0.6136278195488721
key: train_jcc
value: [0.66412214 0.671875 0.69291339 0.70769231 0.71969697 0.68217054
0.72440945 0.70149254 0.68939394 0.68148148]
mean value: 0.693524775026404
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01054907 0.01217103 0.00988889 0.00933909 0.00983024 0.00909495
0.00927186 0.00978088 0.00950527 0.00981116]
mean value: 0.00992424488067627
key: score_time
value: [0.05331373 0.02406383 0.01151419 0.01075935 0.01080298 0.01084328
0.01075506 0.01048541 0.01077724 0.01094627]
mean value: 0.01642613410949707
key: test_mcc
value: [0.58536941 0.58536941 0.09166985 0.16903085 0.3380617 0.91986621
0.1767767 0.3380617 0.58536941 0.48075018]
mean value: 0.42703254071232083
key: train_mcc
value: [0.61642079 0.60464892 0.60225989 0.65942846 0.59461381 0.60464892
0.6335132 0.59600656 0.62203998 0.61491869]
mean value: 0.6148499207645965
key: test_accuracy
value: [0.79166667 0.79166667 0.54166667 0.58333333 0.66666667 0.95833333
0.58333333 0.66666667 0.79166667 0.73913043]
mean value: 0.7114130434782608
key: train_accuracy
value: [0.80465116 0.8 0.8 0.82790698 0.79534884 0.8
0.81395349 0.79534884 0.80930233 0.80555556]
mean value: 0.8052067183462532
key: test_fscore
value: [0.7826087 0.8 0.62068966 0.61538462 0.69230769 0.95652174
0.64285714 0.69230769 0.8 0.7 ]
mean value: 0.7302677232812166
key: train_fscore
value: [0.8173913 0.81057269 0.80717489 0.83555556 0.80530973 0.81057269
0.8245614 0.80701754 0.81777778 0.81578947]
mean value: 0.8151723055588781
key: test_precision
value: [0.81818182 0.76923077 0.52941176 0.57142857 0.64285714 1.
0.5625 0.64285714 0.76923077 0.77777778]
mean value: 0.7083475756269875
key: train_precision
value: [0.76422764 0.76666667 0.77586207 0.79661017 0.76470588 0.76666667
0.7768595 0.76033058 0.77966102 0.775 ]
mean value: 0.7726590196013521
key: test_recall
value: [0.75 0.83333333 0.75 0.66666667 0.75 0.91666667
0.75 0.75 0.83333333 0.63636364]
mean value: 0.7636363636363637
key: train_recall
value: [0.87850467 0.85981308 0.8411215 0.87850467 0.85046729 0.85981308
0.87850467 0.85981308 0.85981308 0.86111111]
mean value: 0.8627466251298027
key: test_roc_auc
value: [0.79166667 0.79166667 0.54166667 0.58333333 0.66666667 0.95833333
0.58333333 0.66666667 0.79166667 0.73484848]
mean value: 0.7109848484848484
key: train_roc_auc
value: [0.80499308 0.80027691 0.80019038 0.82814123 0.79560402 0.80027691
0.81425234 0.79564728 0.80953617 0.80555556]
mean value: 0.8054473866389754
key: test_jcc
value: [0.64285714 0.66666667 0.45 0.44444444 0.52941176 0.91666667
0.47368421 0.52941176 0.66666667 0.53846154]
mean value: 0.5858270865701206
key: train_jcc
value: [0.69117647 0.68148148 0.67669173 0.71755725 0.67407407 0.68148148
0.70149254 0.67647059 0.69172932 0.68888889]
mean value: 0.6881043826602864
MCC on Blind test: 0.07
Accuracy on Blind test: 0.55
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01442719 0.01194906 0.0118742 0.01261497 0.0117321 0.01312518
0.01234365 0.011832 0.01180959 0.01206326]
mean value: 0.012377119064331055
key: score_time
value: [0.00988269 0.00954556 0.00970721 0.01022983 0.00949931 0.01051331
0.00969958 0.00951052 0.00961781 0.00963497]
mean value: 0.009784078598022461
key: test_mcc
value: [0.83333333 0.66666667 0.45834925 0.41812101 0.43033148 0.6761234
0.50709255 0.6761234 0.6761234 0.74242424]
mean value: 0.6084688743039378
key: train_mcc
value: [0.7751614 0.76916509 0.80755603 0.78777764 0.83492613 0.80044837
0.77323619 0.75638496 0.79889412 0.7741473 ]
mean value: 0.7877697215690955
key: test_accuracy
value: [0.91666667 0.83333333 0.70833333 0.70833333 0.70833333 0.83333333
0.75 0.83333333 0.83333333 0.86956522]
mean value: 0.7994565217391305
key: train_accuracy
value: [0.88372093 0.87906977 0.90232558 0.89302326 0.91627907 0.89767442
0.88372093 0.8744186 0.89767442 0.88425926]
mean value: 0.8912166236003445
key: test_fscore
value: [0.91666667 0.83333333 0.75862069 0.72 0.74074074 0.81818182
0.76923077 0.84615385 0.84615385 0.86956522]
mean value: 0.8118646927507497
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[0.89082969 0.88793103 0.9058296 0.8959276 0.91891892 0.90265487
0.88986784 0.88209607 0.90178571 0.89082969]
mean value: 0.8966671033091514
key: test_precision
value: [0.91666667 0.83333333 0.64705882 0.69230769 0.66666667 0.9
0.71428571 0.78571429 0.78571429 0.83333333]
mean value: 0.777508080155139
key: train_precision
value: [0.83606557 0.824 0.87068966 0.86842105 0.88695652 0.85714286
0.84166667 0.82786885 0.86324786 0.84297521]
mean value: 0.8519034249441588
key: test_recall
value: [0.91666667 0.83333333 0.91666667 0.75 0.83333333 0.75
0.83333333 0.91666667 0.91666667 0.90909091]
mean value: 0.8575757575757575
key: train_recall
value: [0.95327103 0.96261682 0.94392523 0.92523364 0.95327103 0.95327103
0.94392523 0.94392523 0.94392523 0.94444444]
mean value: 0.9467808930425753
key: test_roc_auc
value: [0.91666667 0.83333333 0.70833333 0.70833333 0.70833333 0.83333333
0.75 0.83333333 0.83333333 0.87121212]
mean value: 0.7996212121212122
key: train_roc_auc
value: [0.88404292 0.87945656 0.90251817 0.89317238 0.91645033 0.89793181
0.88399965 0.87474039 0.89788854 0.88425926]
mean value: 0.8914460020768432
key: test_jcc
value: [0.84615385 0.71428571 0.61111111 0.5625 0.58823529 0.69230769
0.625 0.73333333 0.73333333 0.76923077]
mean value: 0.6875491093873447
key: train_jcc
value: [0.80314961 0.79844961 0.82786885 0.81147541 0.85 0.82258065
0.8015873 0.7890625 0.82113821 0.80314961]
mean value: 0.8128461745427313
MCC on Blind test: 0.3
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.16637278 0.92700911 0.56351781 0.95615482 0.75952268 0.51315856
0.87010455 0.76887608 0.95761991 1.15136218]
mean value: 0.8633698463439942
key: score_time
value: [0.01427674 0.01241374 0.01237249 0.01237464 0.01255679 0.01212168
0.0123601 0.01237512 0.0270009 0.01376033]
mean value: 0.014161252975463867
key: test_mcc
value: [0.66666667 0.53033009 0.64168895 0.66666667 0.50709255 0.6761234
0.60246408 0.6761234 0.84515425 0.48075018]
mean value: 0.6293060233723498
key: train_mcc
value: [0.94484861 0.95386483 0.85115957 0.92557979 0.82827515 0.87038973
0.83961263 0.94418484 0.91953574 0.95407186]
mean value: 0.9031522737284948
key: test_accuracy
value: [0.83333333 0.75 0.79166667 0.83333333 0.75 0.83333333
0.79166667 0.83333333 0.91666667 0.73913043]
mean value: 0.8072463768115942
key: train_accuracy
value: [0.97209302 0.97674419 0.9255814 0.9627907 0.90697674 0.93488372
0.91627907 0.97209302 0.95813953 0.97685185]
mean value: 0.9502433247200689
key: test_fscore
value: [0.83333333 0.7 0.82758621 0.83333333 0.72727273 0.81818182
0.76190476 0.84615385 0.92307692 0.7 ]
mean value: 0.7970842950153295
key: train_fscore
value: [0.97247706 0.97695853 0.92523364 0.96261682 0.89690722 0.93577982
0.91 0.97196262 0.95964126 0.97716895]
mean value: 0.9488745912063633
key: test_precision
value: [0.83333333 0.875 0.70588235 0.83333333 0.8 0.9
0.88888889 0.78571429 0.85714286 0.77777778]
mean value: 0.8257072829131653
key: train_precision
value: [0.95495495 0.96363636 0.92523364 0.96261682 1. 0.91891892
0.97849462 0.97196262 0.92241379 0.96396396]
mean value: 0.9562195702345714
key: test_recall
value: [0.83333333 0.58333333 1. 0.83333333 0.66666667 0.75
0.66666667 0.91666667 1. 0.63636364]
mean value: 0.7886363636363636
key: train_recall
value: [0.99065421 0.99065421 0.92523364 0.96261682 0.81308411 0.95327103
0.85046729 0.97196262 1. 0.99074074]
mean value: 0.9448684665974385
key: test_roc_auc
value: [0.83333333 0.75 0.79166667 0.83333333 0.75 0.83333333
0.79166667 0.83333333 0.91666667 0.73484848]
mean value: 0.8068181818181819
key: train_roc_auc
value: [0.97217895 0.97680858 0.92557979 0.96278989 0.90654206 0.93496885
0.91597439 0.97209242 0.95833333 0.97685185]
mean value: 0.9502120110764971
key: test_jcc
value: [0.71428571 0.53846154 0.70588235 0.71428571 0.57142857 0.69230769
0.61538462 0.73333333 0.85714286 0.53846154]
mean value: 0.6680973928032752
key: train_jcc
value: [0.94642857 0.95495495 0.86086957 0.92792793 0.81308411 0.87931034
0.83486239 0.94545455 0.92241379 0.95535714]
mean value: 0.9040663343242202
MCC on Blind test: 0.33
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02159524 0.01609659 0.01578259 0.01616478 0.01528406 0.01571155
0.01467395 0.01572943 0.01593328 0.01627374]
mean value: 0.016324520111083984
key: score_time
value: [0.01183963 0.00899029 0.00861549 0.00864005 0.00864673 0.00881767
0.00853395 0.00855303 0.00865746 0.00861073]
mean value: 0.00899050235748291
key: test_mcc
value: [ 0.58536941 0.45834925 -0.0836242 0.50709255 0.1767767 0.3380617
0.58536941 0.25819889 0.50709255 0.56490196]
mean value: 0.3897588209443773
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.70833333 0.45833333 0.75 0.58333333 0.66666667
0.79166667 0.625 0.75 0.7826087 ]
mean value: 0.6907608695652174
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.63157895 0.48 0.76923077 0.64285714 0.63636364
0.8 0.66666667 0.76923077 0.76190476]
mean value: 0.6940441389274341
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.85714286 0.46153846 0.71428571 0.5625 0.7
0.76923077 0.6 0.71428571 0.8 ]
mean value: 0.6997165334665335
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.5 0.83333333 0.75 0.58333333
0.83333333 0.75 0.83333333 0.72727273]
mean value: 0.706060606060606
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.70833333 0.45833333 0.75 0.58333333 0.66666667
0.79166667 0.625 0.75 0.78030303]
mean value: 0.690530303030303
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.46153846 0.31578947 0.625 0.47368421 0.46666667
0.66666667 0.5 0.625 0.61538462]
mean value: 0.5392587237324079
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.67
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10004592 0.09774756 0.0969007 0.0985167 0.09694242 0.09730864
0.09703588 0.09811378 0.09755826 0.09992361]
mean value: 0.09800934791564941
key: score_time
value: [0.0174191 0.01727986 0.01732039 0.01732492 0.01718807 0.0171814
0.01742411 0.01737833 0.01726508 0.01750612]
mean value: 0.017328739166259766
key: test_mcc
value: [0.75261781 0.50709255 0.35355339 0.50709255 0.41812101 0.60246408
0.43033148 0.6761234 0.75261781 0.39393939]
mean value: 0.5393953475994654
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.75 0.66666667 0.75 0.70833333 0.79166667
0.70833333 0.83333333 0.875 0.69565217]
mean value: 0.7653985507246377
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.72727273 0.71428571 0.76923077 0.72 0.76190476
0.74074074 0.84615385 0.88 0.69565217]
mean value: 0.7724805950892907
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.8 0.625 0.71428571 0.69230769 0.88888889
0.66666667 0.78571429 0.84615385 0.66666667]
mean value: 0.759477466977467
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.66666667 0.83333333 0.83333333 0.75 0.66666667
0.83333333 0.91666667 0.91666667 0.72727273]
mean value: 0.7977272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.75 0.66666667 0.75 0.70833333 0.79166667
0.70833333 0.83333333 0.875 0.6969697 ]
mean value: 0.7655303030303031
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.57142857 0.55555556 0.625 0.5625 0.61538462
0.58823529 0.73333333 0.78571429 0.53333333]
mean value: 0.6339715758098111
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.62
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00928187 0.00908732 0.0090332 0.00918388 0.01034474 0.00947237
0.0090425 0.00902772 0.00928903 0.00906014]
mean value: 0.009282279014587402
key: score_time
value: [0.0086329 0.00856829 0.00864267 0.00907159 0.00937819 0.00861835
0.00854588 0.00861311 0.00856185 0.00865269]
mean value: 0.008728551864624023
key: test_mcc
value: [ 0.3380617 0.25819889 0.0836242 0.33333333 0. 0.3380617
-0.16903085 0.33333333 0.41812101 0.21452908]
mean value: 0.21482323978354503
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.625 0.54166667 0.66666667 0.5 0.66666667
0.41666667 0.66666667 0.70833333 0.60869565]
mean value: 0.6067028985507247
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.63636364 0.57142857 0.56 0.66666667 0.53846154 0.63636364
0.36363636 0.66666667 0.72 0.52631579]
mean value: 0.5885902869060764
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 0.66666667 0.53846154 0.66666667 0.5 0.7
0.4 0.66666667 0.69230769 0.625 ]
mean value: 0.615576923076923
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.58333333 0.5 0.58333333 0.66666667 0.58333333 0.58333333
0.33333333 0.66666667 0.75 0.45454545]
mean value: 0.5704545454545454
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.625 0.54166667 0.66666667 0.5 0.66666667
0.41666667 0.66666667 0.70833333 0.60227273]
mean value: 0.6060606060606061
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.46666667 0.4 0.38888889 0.5 0.36842105 0.46666667
0.22222222 0.5 0.5625 0.35714286]
mean value: 0.42325083542188807
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.57
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.30548859 1.33230543 1.29387045 1.30024052 1.30000615 1.29675961
1.29720092 1.37435007 1.34458041 1.36174321]
mean value: 1.320654535293579
key: score_time
value: [0.08945274 0.08969522 0.08928704 0.14881968 0.08952141 0.08939528
0.08970332 0.09751368 0.09607148 0.09696031]
mean value: 0.09764201641082763
key: test_mcc
value: [0.6761234 0.57735027 0.60246408 0.66666667 0.6761234 0.64168895
0.41812101 0.53033009 0.75261781 0.38932432]
mean value: 0.5930809987870407
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.75 0.79166667 0.83333333 0.83333333 0.79166667
0.70833333 0.75 0.875 0.69565217]
mean value: 0.7862318840579711
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.81818182 0.66666667 0.81481481 0.83333333 0.84615385 0.73684211
0.72 0.78571429 0.88 0.66666667]
mean value: 0.7768373536794589
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 1. 0.73333333 0.83333333 0.78571429 1.
0.69230769 0.6875 0.84615385 0.7 ]
mean value: 0.8178342490842491
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.91666667 0.83333333 0.91666667 0.58333333
0.75 0.91666667 0.91666667 0.63636364]
mean value: 0.771969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.75 0.79166667 0.83333333 0.83333333 0.79166667
0.70833333 0.75 0.875 0.69318182]
mean value: 0.7859848484848485
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.69230769 0.5 0.6875 0.71428571 0.73333333 0.58333333
0.5625 0.64705882 0.78571429 0.5 ]
mean value: 0.6406033182503771
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.73757911 0.88143897 0.87742066 0.89499593 1.00112581 0.90873289
0.8691864 0.89679146 0.90489745 0.89780354]
mean value: 0.9869972229003906
key: score_time
value: [0.25257754 0.21717262 0.16878605 0.22824645 0.13930917 0.22713327
0.25927711 0.24127674 0.2348907 0.26814222]
mean value: 0.2236811876296997
key: test_mcc
value: [0.6761234 0.45834925 0.60246408 0.75261781 0.6761234 0.64168895
0.41812101 0.60246408 0.66666667 0.47727273]
mean value: 0.597189136410222
key: train_mcc
value: [0.92574643 0.89803517 0.90713977 0.89803517 0.92623389 0.89803517
0.89803517 0.91632053 0.90713977 0.91702052]
mean value: 0.9091741604771293
key: test_accuracy
value: [0.83333333 0.70833333 0.79166667 0.875 0.83333333 0.79166667
0.70833333 0.79166667 0.83333333 0.73913043]
mean value: 0.7905797101449276
key: train_accuracy
value: [0.9627907 0.94883721 0.95348837 0.94883721 0.9627907 0.94883721
0.94883721 0.95813953 0.95348837 0.95833333]
mean value: 0.954437984496124
key: test_fscore
value: [0.81818182 0.63157895 0.81481481 0.88 0.84615385 0.73684211
0.69565217 0.81481481 0.83333333 0.72727273]
mean value: 0.7798644581115977
key: train_fscore
value: [0.96296296 0.94930876 0.9537037 0.94930876 0.96330275 0.94930876
0.94930876 0.95813953 0.9537037 0.95890411]
mean value: 0.9547951790178185
key: test_precision
value: [0.9 0.85714286 0.73333333 0.84615385 0.78571429 1.
0.72727273 0.73333333 0.83333333 0.72727273]
mean value: 0.8143556443556443
key: train_precision
value: [0.95412844 0.93636364 0.94495413 0.93636364 0.94594595 0.93636364
0.93636364 0.9537037 0.94495413 0.94594595]
mean value: 0.9435086838297848
key: test_recall
value: [0.75 0.5 0.91666667 0.91666667 0.91666667 0.58333333
0.66666667 0.91666667 0.83333333 0.72727273]
mean value: 0.7727272727272727
key: train_recall
value: [0.97196262 0.96261682 0.96261682 0.96261682 0.98130841 0.96261682
0.96261682 0.96261682 0.96261682 0.97222222]
mean value: 0.9663811007268951
key: test_roc_auc
value: [0.83333333 0.70833333 0.79166667 0.875 0.83333333 0.79166667
0.70833333 0.79166667 0.83333333 0.73863636]
mean value: 0.790530303030303
key: train_roc_auc
value: [0.96283316 0.948901 0.95353063 0.948901 0.96287643 0.948901
0.948901 0.95816026 0.95353063 0.95833333]
mean value: 0.9544868466597439
key: test_jcc
value: [0.69230769 0.46153846 0.6875 0.78571429 0.73333333 0.58333333
0.53333333 0.6875 0.71428571 0.57142857]
mean value: 0.6450274725274725
key: train_jcc
value: [0.92857143 0.90350877 0.91150442 0.90350877 0.92920354 0.90350877
0.90350877 0.91964286 0.91150442 0.92105263]
mean value: 0.9135514394393063
MCC on Blind test: 0.34
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02658057 0.01026082 0.01046276 0.01046801 0.01056957 0.01058936
0.00954437 0.00941682 0.00935864 0.00949931]
mean value: 0.011675024032592773
key: score_time
value: [0.01053667 0.00951457 0.00990701 0.00974226 0.0097394 0.00976729
0.00901318 0.00893641 0.008919 0.00859046]
mean value: 0.00946662425994873
key: test_mcc
value: [0.58536941 0.35355339 0. 0.50709255 0.33333333 0.58536941
0.25819889 0.66666667 0.84515425 0.82575758]
mean value: 0.49604954776735644
key: train_mcc
value: [0.59116891 0.60929387 0.63724472 0.64842315 0.66084467 0.61888689
0.67521245 0.6335132 0.62092317 0.6049981 ]
mean value: 0.6300509127566093
key: test_accuracy
value: [0.79166667 0.66666667 0.5 0.75 0.66666667 0.79166667
0.625 0.83333333 0.91666667 0.91304348]
mean value: 0.7454710144927537
key: train_accuracy
value: [0.79534884 0.80465116 0.81860465 0.82325581 0.82790698 0.80930233
0.8372093 0.81395349 0.80930233 0.80092593]
mean value: 0.8140460809646857
key: test_fscore
value: [0.7826087 0.6 0.53846154 0.76923077 0.66666667 0.8
0.66666667 0.83333333 0.92307692 0.90909091]
mean value: 0.748913550217898
key: train_fscore
value: [0.79816514 0.80373832 0.81860465 0.82882883 0.83700441 0.81105991
0.84018265 0.8245614 0.8161435 0.81057269]
mean value: 0.8188861485376868
key: test_precision
value: [0.81818182 0.75 0.5 0.71428571 0.66666667 0.76923077
0.6 0.83333333 0.85714286 0.90909091]
mean value: 0.7417932067932068
key: train_precision
value: [0.78378378 0.80373832 0.81481481 0.8 0.79166667 0.8
0.82142857 0.7768595 0.78448276 0.77310924]
mean value: 0.7949883660901246
key: test_recall
value: [0.75 0.5 0.58333333 0.83333333 0.66666667 0.83333333
0.75 0.83333333 1. 0.90909091]
mean value: 0.7659090909090909
key: train_recall
value: [0.81308411 0.80373832 0.82242991 0.85981308 0.88785047 0.82242991
0.85981308 0.87850467 0.85046729 0.85185185]
mean value: 0.8449982692973347
key: test_roc_auc
value: [0.79166667 0.66666667 0.5 0.75 0.66666667 0.79166667
0.625 0.83333333 0.91666667 0.91287879]
mean value: 0.7454545454545455
key: train_roc_auc
value: [0.79543094 0.80464694 0.81862236 0.82342506 0.82818449 0.8093631
0.83731395 0.81425234 0.8094929 0.80092593]
mean value: 0.814165801315334
key: test_jcc
value: [0.64285714 0.42857143 0.36842105 0.625 0.5 0.66666667
0.5 0.71428571 0.85714286 0.83333333]
mean value: 0.6136278195488721
key: train_jcc
value: [0.66412214 0.671875 0.69291339 0.70769231 0.71969697 0.68217054
0.72440945 0.70149254 0.68939394 0.68148148]
mean value: 0.693524775026404
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10902405 0.1528337 0.10040998 0.20752549 0.06414986 0.0619688
0.0615015 0.06558824 0.06363559 0.06329346]
mean value: 0.09499306678771972
key: score_time
value: [0.01049137 0.01351738 0.01741481 0.01112556 0.01131034 0.0103085
0.01125455 0.01034284 0.01033878 0.01028371]
mean value: 0.011638784408569336
key: test_mcc
value: [0.75261781 0.45834925 0.50709255 0.58536941 0.58536941 0.64168895
0.58536941 0.43033148 0.83333333 0.38932432]
mean value: 0.5768845916089698
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.70833333 0.75 0.79166667 0.79166667 0.79166667
0.79166667 0.70833333 0.91666667 0.69565217]
mean value: 0.7820652173913043
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.63157895 0.76923077 0.8 0.7826087 0.73684211
0.8 0.74074074 0.91666667 0.66666667]
mean value: 0.77138998089799
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.85714286 0.71428571 0.76923077 0.81818182 1.
0.76923077 0.66666667 0.91666667 0.7 ]
mean value: 0.8120496170496171
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.83333333 0.83333333 0.75 0.58333333
0.83333333 0.83333333 0.91666667 0.63636364]
mean value: 0.7553030303030304
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.70833333 0.75 0.79166667 0.79166667 0.79166667
0.79166667 0.70833333 0.91666667 0.69318182]
mean value: 0.7818181818181819
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.46153846 0.625 0.66666667 0.64285714 0.58333333
0.66666667 0.58823529 0.84615385 0.5 ]
mean value: 0.6349682180564533
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04819727 0.02819967 0.06439757 0.04967928 0.05256891 0.05795836
0.04971886 0.06058121 0.06324649 0.04557896]
mean value: 0.05201265811920166
key: score_time
value: [0.012532 0.01256657 0.01874089 0.02269316 0.01954985 0.02217078
0.01249123 0.02141213 0.02277303 0.01738524]
mean value: 0.01823148727416992
key: test_mcc
value: [0.66666667 0.30779351 0.16903085 0.43033148 0.16666667 0.66666667
0.50709255 0.58536941 0.83333333 0.21969697]
mean value: 0.45526481023555615
key: train_mcc
value: [0.92574643 0.96278989 0.96295976 0.95352662 0.9443531 0.94418484
0.95352662 0.96295976 0.94418484 0.97259753]
mean value: 0.9526829380391524
key: test_accuracy
value: [0.83333333 0.625 0.58333333 0.70833333 0.58333333 0.83333333
0.75 0.79166667 0.91666667 0.60869565]
mean value: 0.7233695652173913
key: train_accuracy
value: [0.9627907 0.98139535 0.98139535 0.97674419 0.97209302 0.97209302
0.97674419 0.98139535 0.97209302 0.98611111]
mean value: 0.9762855297157622
key: test_fscore
value: [0.83333333 0.70967742 0.61538462 0.66666667 0.58333333 0.83333333
0.76923077 0.7826087 0.91666667 0.60869565]
mean value: 0.7318930485129643
key: train_fscore
value: [0.96296296 0.98130841 0.98148148 0.97652582 0.97222222 0.97196262
0.97652582 0.98148148 0.97196262 0.98630137]
mean value: 0.9762734806063463
key: test_precision
value: [0.83333333 0.57894737 0.57142857 0.77777778 0.58333333 0.83333333
0.71428571 0.81818182 0.91666667 0.58333333]
mean value: 0.7210621250094934
key: train_precision
value: [0.95412844 0.98130841 0.97247706 0.98113208 0.96330275 0.97196262
0.98113208 0.97247706 0.97196262 0.97297297]
mean value: 0.97228560898771
key: test_recall
value: [0.83333333 0.91666667 0.66666667 0.58333333 0.58333333 0.83333333
0.83333333 0.75 0.91666667 0.63636364]
mean value: 0.7553030303030304
key: train_recall
value: [0.97196262 0.98130841 0.99065421 0.97196262 0.98130841 0.97196262
0.97196262 0.99065421 0.97196262 1. ]
mean value: 0.9803738317757009
key: test_roc_auc
value: [0.83333333 0.625 0.58333333 0.70833333 0.58333333 0.83333333
0.75 0.79166667 0.91666667 0.60984848]
mean value: 0.7234848484848485
key: train_roc_auc
value: [0.96283316 0.98139495 0.98143821 0.97672205 0.97213569 0.97209242
0.97672205 0.98143821 0.97209242 0.98611111]
mean value: 0.9762980269989616
key: test_jcc
value: [0.71428571 0.55 0.44444444 0.5 0.41176471 0.71428571
0.625 0.64285714 0.84615385 0.4375 ]
mean value: 0.5886291567909215
key: train_jcc
value: [0.92857143 0.96330275 0.96363636 0.95412844 0.94594595 0.94545455
0.95412844 0.96363636 0.94545455 0.97297297]
mean value: 0.9537231798699689
MCC on Blind test: 0.4
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01826453 0.01130295 0.01051307 0.01082659 0.01079297 0.0103395
0.00921464 0.01058745 0.00952053 0.01054502]
mean value: 0.01119072437286377
key: score_time
value: [0.01071835 0.01040387 0.00927067 0.00956964 0.00958848 0.00956655
0.00894737 0.00974202 0.00914693 0.00972438]
mean value: 0.00966782569885254
key: test_mcc
value: [0.43033148 0.41812101 0.0860663 0.35355339 0.60246408 0.41812101
0.33333333 0.6761234 0.41812101 0.65151515]
mean value: 0.43877501498062393
key: train_mcc
value: [0.48481158 0.49495629 0.48481158 0.4922574 0.47473832 0.49351477
0.49351477 0.44598003 0.49351477 0.48685383]
mean value: 0.48449533333574646
key: test_accuracy
value: [0.70833333 0.70833333 0.54166667 0.66666667 0.79166667 0.70833333
0.66666667 0.83333333 0.70833333 0.82608696]
mean value: 0.7159420289855073
key: train_accuracy
value: [0.73953488 0.74418605 0.73953488 0.74418605 0.73488372 0.74418605
0.74418605 0.72093023 0.74418605 0.74074074]
mean value: 0.7396554694229113
key: test_fscore
value: [0.74074074 0.72 0.59259259 0.71428571 0.81481481 0.72
0.66666667 0.84615385 0.69565217 0.81818182]
mean value: 0.7329088367349237
key: train_fscore
value: [0.75652174 0.76190476 0.75652174 0.75770925 0.7510917 0.75982533
0.75982533 0.73684211 0.75982533 0.75862069]
mean value: 0.7558687971774802
key: test_precision
value: [0.66666667 0.69230769 0.53333333 0.625 0.73333333 0.69230769
0.66666667 0.78571429 0.72727273 0.81818182]
mean value: 0.6940784215784216
key: train_precision
value: [0.70731707 0.70967742 0.70731707 0.71666667 0.70491803 0.71311475
0.71311475 0.69421488 0.71311475 0.70967742]
mean value: 0.7089132822832833
key: test_recall
value: [0.83333333 0.75 0.66666667 0.83333333 0.91666667 0.75
0.66666667 0.91666667 0.66666667 0.81818182]
mean value: 0.7818181818181819
key: train_recall
value: [0.81308411 0.82242991 0.81308411 0.80373832 0.80373832 0.81308411
0.81308411 0.78504673 0.81308411 0.81481481]
mean value: 0.8095188646590515
key: test_roc_auc
value: [0.70833333 0.70833333 0.54166667 0.66666667 0.79166667 0.70833333
0.66666667 0.83333333 0.70833333 0.82575758]
mean value: 0.7159090909090909
key: train_roc_auc
value: [0.73987539 0.74454829 0.73987539 0.74446175 0.73520249 0.74450502
0.74450502 0.72122707 0.74450502 0.74074074]
mean value: 0.739944617514711
key: test_jcc
value: [0.58823529 0.5625 0.42105263 0.55555556 0.6875 0.5625
0.5 0.73333333 0.53333333 0.69230769]
mean value: 0.5836317840226509
key: train_jcc
value: [0.60839161 0.61538462 0.60839161 0.60992908 0.6013986 0.61267606
0.61267606 0.58333333 0.61267606 0.61111111]
mean value: 0.6075968125039147
MCC on Blind test: 0.28
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01256156 0.01546574 0.01470971 0.01508951 0.01480317 0.01682806
0.01639032 0.01579952 0.01583982 0.01432538]
mean value: 0.015181279182434082
key: score_time
value: [0.00943828 0.01121879 0.01107645 0.01168895 0.01164103 0.01182342
0.01168585 0.01208973 0.01164865 0.01169538]
mean value: 0.011400651931762696
key: test_mcc
value: [0.64168895 0.58536941 0.30151134 0.77459667 0.50709255 0.45834925
0.43033148 0.58536941 0.77459667 0.6992059 ]
mean value: 0.5758111628033658
key: train_mcc
value: [0.69402575 0.76240636 0.62644734 0.82329526 0.84188663 0.81614982
0.81270771 0.61063847 0.74777648 0.66695469]
mean value: 0.7402288516692308
key: test_accuracy
value: [0.79166667 0.79166667 0.58333333 0.875 0.75 0.70833333
0.70833333 0.79166667 0.875 0.82608696]
mean value: 0.7701086956521739
key: train_accuracy
value: [0.82790698 0.86976744 0.78139535 0.91162791 0.92093023 0.90697674
0.89767442 0.77209302 0.86511628 0.81944444]
mean value: 0.8572932816537467
key: test_fscore
value: [0.82758621 0.8 0.70588235 0.85714286 0.76923077 0.63157895
0.74074074 0.7826087 0.88888889 0.84615385]
mean value: 0.7849813305015425
key: train_fscore
value: [0.85140562 0.88333333 0.81992337 0.91162791 0.92018779 0.90291262
0.90677966 0.7030303 0.87763713 0.84210526]
mean value: 0.8618943007240835
key: test_precision
value: [0.70588235 0.76923077 0.54545455 1. 0.71428571 0.85714286
0.66666667 0.81818182 0.8 0.73333333]
mean value: 0.761017805723688
key: train_precision
value: [0.74647887 0.79699248 0.69480519 0.90740741 0.9245283 0.93939394
0.82945736 1. 0.8 0.74820144]
mean value: 0.8387265001125784
key: test_recall
value: [1. 0.83333333 1. 0.75 0.83333333 0.5
0.83333333 0.75 1. 1. ]
mean value: 0.85
key: train_recall
value: [0.99065421 0.99065421 1. 0.91588785 0.91588785 0.86915888
1. 0.54205607 0.97196262 0.96296296]
mean value: 0.9159224645205953
key: test_roc_auc
value: [0.79166667 0.79166667 0.58333333 0.875 0.75 0.70833333
0.70833333 0.79166667 0.875 0.83333333]
mean value: 0.7708333333333334
key: train_roc_auc
value: [0.82866044 0.8703271 0.78240741 0.91164763 0.92090689 0.90680166
0.89814815 0.77102804 0.86561094 0.81944444]
mean value: 0.8574982692973347
key: test_jcc
value: [0.70588235 0.66666667 0.54545455 0.75 0.625 0.46153846
0.58823529 0.64285714 0.8 0.73333333]
mean value: 0.6518967796908973
key: train_jcc
value: [0.74125874 0.79104478 0.69480519 0.83760684 0.85217391 0.82300885
0.82945736 0.54205607 0.78195489 0.72727273]
mean value: 0.762063936598939
MCC on Blind test: 0.3
Accuracy on Blind test: 0.69
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01563263 0.01531029 0.01476598 0.01530075 0.01501727 0.01689005
0.01432395 0.01691675 0.01566124 0.01474738]
mean value: 0.015456628799438477
key: score_time
value: [0.01183009 0.0117557 0.01172447 0.01171923 0.01170135 0.01168537
0.01175404 0.011832 0.01171136 0.01169014]
mean value: 0.011740374565124511
key: test_mcc
value: [0.60246408 0.45834925 0.57735027 0.57735027 0.53033009 0.60246408
0.58536941 0.6761234 0.57735027 0.3030303 ]
mean value: 0.5490181407944392
key: train_mcc
value: [0.75414397 0.80417261 0.85658506 0.56745034 0.77988595 0.87115575
0.8191606 0.88004537 0.54792743 0.80125769]
mean value: 0.7681784775947282
key: test_accuracy
value: [0.79166667 0.70833333 0.75 0.75 0.75 0.79166667
0.79166667 0.83333333 0.75 0.65217391]
mean value: 0.7568840579710145
key: train_accuracy
value: [0.86511628 0.89767442 0.9255814 0.74418605 0.88372093 0.93488372
0.90697674 0.93953488 0.73023256 0.89814815]
mean value: 0.8726055124892333
key: test_fscore
value: [0.76190476 0.63157895 0.8 0.66666667 0.78571429 0.76190476
0.8 0.84615385 0.8 0.63636364]
mean value: 0.749028690607638
key: train_fscore
value: [0.84491979 0.88888889 0.92920354 0.65408805 0.89270386 0.93636364
0.91150442 0.94063927 0.78676471 0.90350877]
mean value: 0.8688584936144532
key: test_precision
value: [0.88888889 0.85714286 0.66666667 1. 0.6875 0.88888889
0.76923077 0.78571429 0.66666667 0.63636364]
mean value: 0.7847062659562659
key: train_precision
value: [0.9875 0.96703297 0.88235294 1. 0.82539683 0.91150442
0.86554622 0.91964286 0.64848485 0.85833333]
mean value: 0.8865794415833458
key: test_recall
value: [0.66666667 0.5 1. 0.5 0.91666667 0.66666667
0.83333333 0.91666667 1. 0.63636364]
mean value: 0.7636363636363637
key: train_recall
value: [0.73831776 0.82242991 0.98130841 0.48598131 0.97196262 0.96261682
0.96261682 0.96261682 1. 0.9537037 ]
mean value: 0.8841554170993423
key: test_roc_auc
value: [0.79166667 0.70833333 0.75 0.75 0.75 0.79166667
0.79166667 0.83333333 0.75 0.65151515]
mean value: 0.7568181818181818
key: train_roc_auc
value: [0.86452925 0.89732606 0.92583939 0.74299065 0.88412946 0.93501211
0.90723434 0.93964174 0.73148148 0.89814815]
mean value: 0.8726332641052268
key: test_jcc
value: [0.61538462 0.46153846 0.66666667 0.5 0.64705882 0.61538462
0.66666667 0.73333333 0.66666667 0.46666667]
mean value: 0.6039366515837103
key: train_jcc
value: [0.73148148 0.8 0.8677686 0.48598131 0.80620155 0.88034188
0.83739837 0.88793103 0.64848485 0.824 ]
mean value: 0.7769589072614843
MCC on Blind test: 0.35
Accuracy on Blind test: 0.7
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.14447331 0.12586784 0.1261003 0.11577964 0.11632538 0.11602783
0.11646795 0.11628914 0.11623144 0.11667013]
mean value: 0.12102329730987549
key: score_time
value: [0.01626277 0.01627779 0.0151825 0.0147202 0.01485586 0.01480722
0.01476502 0.01491356 0.01466155 0.01459432]
mean value: 0.015104079246520996
key: test_mcc
value: [0.66666667 0.64168895 0.38490018 0.60246408 0.2508726 0.50709255
0.5 0.6761234 0.91986621 0.31298622]
mean value: 0.546266085923586
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99078321]
mean value: 0.999078321349667
key: test_accuracy
value: [0.83333333 0.79166667 0.66666667 0.79166667 0.625 0.75
0.75 0.83333333 0.95833333 0.65217391]
mean value: 0.7652173913043478
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99537037]
mean value: 0.999537037037037
key: test_fscore
value: [0.83333333 0.73684211 0.73333333 0.81481481 0.60869565 0.72727273
0.75 0.81818182 0.96 0.66666667]
mean value: 0.7649140451039764
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99539171]
mean value: 0.9995391705069124
key: test_precision
value: [0.83333333 1. 0.61111111 0.73333333 0.63636364 0.8
0.75 0.9 0.92307692 0.61538462]
mean value: 0.7802602952602953
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99082569]
mean value: 0.9990825688073395
key: test_recall
value: [0.83333333 0.58333333 0.91666667 0.91666667 0.58333333 0.66666667
0.75 0.75 1. 0.72727273]
mean value: 0.7727272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.79166667 0.66666667 0.79166667 0.625 0.75
0.75 0.83333333 0.95833333 0.65530303]
mean value: 0.765530303030303
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99537037]
mean value: 0.999537037037037
key: test_jcc
value: [0.71428571 0.58333333 0.57894737 0.6875 0.4375 0.57142857
0.6 0.69230769 0.92307692 0.5 ]
mean value: 0.6288379602853287
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99082569]
mean value: 0.9990825688073395
MCC on Blind test: 0.35
Accuracy on Blind test: 0.69
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04997015 0.04298496 0.06290412 0.04491043 0.05685115 0.05192327
0.04238701 0.04378319 0.06164503 0.05723691]
mean value: 0.05145962238311767
key: score_time
value: [0.01879859 0.02429152 0.02979851 0.01755548 0.02789521 0.01707387
0.0175848 0.01788163 0.03390718 0.02229738]
mean value: 0.022708415985107422
key: test_mcc
value: [0.75261781 0.51298918 0.41812101 0.50709255 0.60246408 0.70710678
0.50709255 0.50709255 0.83333333 0.38932432]
mean value: 0.5737234159712618
key: train_mcc
value: [0.96345091 0.94564526 0.96345091 0.95451081 0.99073994 0.97213328
0.96345091 0.99073994 0.96278989 0.96362411]
mean value: 0.967053596512949
key: test_accuracy
value: [0.875 0.70833333 0.70833333 0.75 0.79166667 0.83333333
0.75 0.75 0.91666667 0.69565217]
mean value: 0.7778985507246376
key: train_accuracy
value: [0.98139535 0.97209302 0.98139535 0.97674419 0.99534884 0.98604651
0.98139535 0.99534884 0.98139535 0.98148148]
mean value: 0.9832644272179156
key: test_fscore
value: [0.86956522 0.58823529 0.72 0.72727273 0.76190476 0.8
0.72727273 0.76923077 0.91666667 0.66666667]
mean value: 0.754681483052327
key: train_fscore
value: [0.98095238 0.97115385 0.98095238 0.97607656 0.99530516 0.98591549
0.98095238 0.99530516 0.98130841 0.98113208]
mean value: 0.9829053852317808
key: test_precision
value: [0.90909091 1. 0.69230769 0.8 0.88888889 1.
0.8 0.71428571 0.91666667 0.7 ]
mean value: 0.8421239871239872
key: train_precision
value: [1. 1. 1. 1. 1. 0.99056604
1. 1. 0.98130841 1. ]
mean value: 0.9971874448950803
key: test_recall
value: [0.83333333 0.41666667 0.75 0.66666667 0.66666667 0.66666667
0.66666667 0.83333333 0.91666667 0.63636364]
mean value: 0.7053030303030303
key: train_recall
value: [0.96261682 0.94392523 0.96261682 0.95327103 0.99065421 0.98130841
0.96261682 0.99065421 0.98130841 0.96296296]
mean value: 0.9691934925579785
key: test_roc_auc
value: [0.875 0.70833333 0.70833333 0.75 0.79166667 0.83333333
0.75 0.75 0.91666667 0.69318182]
mean value: 0.7776515151515151
key: train_roc_auc
value: [0.98130841 0.97196262 0.98130841 0.97663551 0.9953271 0.98602458
0.98130841 0.9953271 0.98139495 0.98148148]
mean value: 0.9832078573901003
key: test_jcc
value: [0.76923077 0.41666667 0.5625 0.57142857 0.61538462 0.66666667
0.57142857 0.625 0.84615385 0.5 ]
mean value: 0.6144459706959707
key: train_jcc
value: [0.96261682 0.94392523 0.96261682 0.95327103 0.99065421 0.97222222
0.96261682 0.99065421 0.96330275 0.96296296]
mean value: 0.9664843077665679
MCC on Blind test: 0.45
Accuracy on Blind test: 0.72
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.04568768 0.09330058 0.08510971 0.06851339 0.06498671 0.05895758
0.06189704 0.06270814 0.06319594 0.06500459]
mean value: 0.06693613529205322
key: score_time
value: [0.0236721 0.02396679 0.02092028 0.02220488 0.02225089 0.02138257
0.02423358 0.01493979 0.021734 0.02368546]
mean value: 0.021899032592773437
key: test_mcc
value: [0.66666667 0.3380617 0. 0.41812101 0.3380617 0.66666667
0.35355339 0.43033148 0.6761234 0.47727273]
mean value: 0.4364858746680441
key: train_mcc
value: [0.99074074 0.99074074 0.99074074 0.99074074 0.99074074 0.99074074
0.99074074 0.99074074 0.98156643 0.99078321]
mean value: 0.9898275564560949
key: test_accuracy
value: [0.83333333 0.66666667 0.5 0.70833333 0.66666667 0.83333333
0.66666667 0.70833333 0.83333333 0.73913043]
mean value: 0.7155797101449275
key: train_accuracy
value: [0.99534884 0.99534884 0.99534884 0.99534884 0.99534884 0.99534884
0.99534884 0.99534884 0.99069767 0.99537037]
mean value: 0.9948858742463393
key: test_fscore
value: [0.83333333 0.63636364 0.57142857 0.69565217 0.69230769 0.83333333
0.71428571 0.74074074 0.84615385 0.72727273]
mean value: 0.7290871769132639
key: train_fscore
value: [0.99534884 0.99534884 0.99534884 0.99534884 0.99534884 0.99534884
0.99534884 0.99534884 0.99074074 0.99539171]
mean value: 0.9948923143484284
key: test_precision
value: [0.83333333 0.7 0.5 0.72727273 0.64285714 0.83333333
0.625 0.66666667 0.78571429 0.72727273]
mean value: 0.7041450216450217
key: train_precision
value: [0.99074074 0.99074074 0.99074074 0.99074074 0.99074074 0.99074074
0.99074074 0.99074074 0.98165138 0.99082569]
mean value: 0.9898402990146109
key: test_recall
value: [0.83333333 0.58333333 0.66666667 0.66666667 0.75 0.83333333
0.83333333 0.83333333 0.91666667 0.72727273]
mean value: 0.7643939393939394
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.66666667 0.5 0.70833333 0.66666667 0.83333333
0.66666667 0.70833333 0.83333333 0.73863636]
mean value: 0.7155303030303031
key: train_roc_auc
value: [0.99537037 0.99537037 0.99537037 0.99537037 0.99537037 0.99537037
0.99537037 0.99537037 0.99074074 0.99537037]
mean value: 0.9949074074074074
key: test_jcc
value: [0.71428571 0.46666667 0.4 0.53333333 0.52941176 0.71428571
0.55555556 0.58823529 0.73333333 0.57142857]
mean value: 0.5806535947712418
key: train_jcc
value: [0.99074074 0.99074074 0.99074074 0.99074074 0.99074074 0.99074074
0.99074074 0.99074074 0.98165138 0.99082569]
mean value: 0.9898402990146109
MCC on Blind test: 0.15
Accuracy on Blind test: 0.58
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.39761639 0.39170408 0.38538814 0.38379908 0.39135814 0.39177537
0.38714075 0.38791275 0.39024019 0.39440322]
mean value: 0.39013381004333497
key: score_time
value: [0.00975585 0.00946522 0.00950503 0.00953794 0.00963545 0.00959516
0.00949287 0.00942302 0.0098877 0.00990605]
mean value: 0.009620428085327148
key: test_mcc
value: [0.58536941 0.45834925 0.35355339 0.83333333 0.6761234 0.60246408
0.66666667 0.58536941 0.75261781 0.38932432]
mean value: 0.5903171062535404
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.70833333 0.66666667 0.91666667 0.83333333 0.79166667
0.83333333 0.79166667 0.875 0.69565217]
mean value: 0.7903985507246377
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.63157895 0.71428571 0.91666667 0.84615385 0.76190476
0.83333333 0.7826087 0.88 0.66666667]
mean value: 0.7815807327683758
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.85714286 0.625 0.91666667 0.78571429 0.88888889
0.83333333 0.81818182 0.84615385 0.7 ]
mean value: 0.8089263514263514
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.83333333 0.91666667 0.91666667 0.66666667
0.83333333 0.75 0.91666667 0.63636364]
mean value: 0.771969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.70833333 0.66666667 0.91666667 0.83333333 0.79166667
0.83333333 0.79166667 0.875 0.69318182]
mean value: 0.7901515151515152
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.46153846 0.55555556 0.84615385 0.73333333 0.61538462
0.71428571 0.64285714 0.78571429 0.5 ]
mean value: 0.6497680097680097
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01991081 0.0208385 0.02130866 0.02156377 0.02402329 0.02146912
0.02118611 0.03131819 0.02325082 0.02378273]
mean value: 0.02286520004272461
key: score_time
value: [0.01243114 0.01261592 0.0170927 0.01372766 0.01431441 0.01454306
0.01456213 0.02570295 0.01408601 0.01437736]
mean value: 0.015345335006713867
key: test_mcc
value: [0.66666667 0.5 0.0860663 0.3380617 0.2508726 0.41812101
0.27500955 0.41812101 0.3380617 0.13740858]
mean value: 0.34283891128738514
key: train_mcc
value: [1. 1. 0.99074074 0.98156643 0.87730631 1.
0.96346333 1. 0.9109617 0.91132238]
mean value: 0.9635360885574163
key: test_accuracy
value: [0.83333333 0.75 0.54166667 0.66666667 0.625 0.70833333
0.625 0.70833333 0.66666667 0.56521739]
mean value: 0.6690217391304347
key: train_accuracy
value: [1. 1. 0.99534884 0.99069767 0.93488372 1.
0.98139535 1. 0.95348837 0.9537037 ]
mean value: 0.9809517657192076
key: test_fscore
value: [0.83333333 0.75 0.59259259 0.69230769 0.60869565 0.72
0.68965517 0.72 0.69230769 0.58333333]
mean value: 0.688222546846235
key: train_fscore
value: [1. 1. 0.99534884 0.99074074 0.93859649 1.
0.98165138 1. 0.95535714 0.95575221]
mean value: 0.9817446800571425
key: test_precision
value: [0.83333333 0.75 0.53333333 0.64285714 0.63636364 0.69230769
0.58823529 0.69230769 0.64285714 0.53846154]
mean value: 0.655005680593916
key: train_precision
value: [1. 1. 0.99074074 0.98165138 0.88429752 1.
0.96396396 1. 0.91452991 0.91525424]
mean value: 0.9650437753330701
key: test_recall
value: [0.83333333 0.75 0.66666667 0.75 0.58333333 0.75
0.83333333 0.75 0.75 0.63636364]
mean value: 0.7303030303030303
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.75 0.54166667 0.66666667 0.625 0.70833333
0.625 0.70833333 0.66666667 0.56818182]
mean value: 0.6693181818181818
key: train_roc_auc
value: [1. 1. 0.99537037 0.99074074 0.93518519 1.
0.98148148 1. 0.9537037 0.9537037 ]
mean value: 0.9810185185185185
key: test_jcc
value: [0.71428571 0.6 0.42105263 0.52941176 0.4375 0.5625
0.52631579 0.5625 0.52941176 0.41176471]
mean value: 0.5294742370632464
key: train_jcc
value: [1. 1. 0.99074074 0.98165138 0.88429752 1.
0.96396396 1. 0.91452991 0.91525424]
mean value: 0.9650437753330701
MCC on Blind test: 0.08
Accuracy on Blind test: 0.55
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02385974 0.03455305 0.03563571 0.03445697 0.03979993 0.04786658
0.05260968 0.03588843 0.04903007 0.03504395]
mean value: 0.03887441158294678
key: score_time
value: [0.02322555 0.02342844 0.02274847 0.02379513 0.02408171 0.0219028
0.0235405 0.0222435 0.02363682 0.02212644]
mean value: 0.0230729341506958
key: test_mcc
value: [0.6761234 0.60246408 0.35355339 0.41812101 0.41812101 0.3380617
0.41812101 0.58536941 0.91986621 0.56490196]
mean value: 0.5294703160678628
key: train_mcc
value: [0.84220552 0.87942529 0.91632053 0.88853311 0.88853311 0.89866654
0.89803517 0.889785 0.8608154 0.90803041]
mean value: 0.8870350069235775
key: test_accuracy
value: [0.83333333 0.79166667 0.66666667 0.70833333 0.70833333 0.66666667
0.70833333 0.79166667 0.95833333 0.7826087 ]
mean value: 0.7615942028985507
key: train_accuracy
value: [0.92093023 0.93953488 0.95813953 0.94418605 0.94418605 0.94883721
0.94883721 0.94418605 0.93023256 0.9537037 ]
mean value: 0.9432773471145564
key: test_fscore
value: [0.81818182 0.76190476 0.71428571 0.72 0.72 0.63636364
0.72 0.8 0.96 0.76190476]
mean value: 0.7612640692640693
key: train_fscore
value: [0.92165899 0.94009217 0.95813953 0.94444444 0.94444444 0.94977169
0.94930876 0.94545455 0.93087558 0.95454545]
mean value: 0.9438735597141295
key: test_precision
value: [0.9 0.88888889 0.625 0.69230769 0.69230769 0.7
0.69230769 0.76923077 0.92307692 0.8 ]
mean value: 0.7683119658119658
key: train_precision
value: [0.90909091 0.92727273 0.9537037 0.93577982 0.93577982 0.92857143
0.93636364 0.92035398 0.91818182 0.9375 ]
mean value: 0.9302597838512632
key: test_recall
value: [0.75 0.66666667 0.83333333 0.75 0.75 0.58333333
0.75 0.83333333 1. 0.72727273]
mean value: 0.7643939393939394
key: train_recall
value: [0.93457944 0.95327103 0.96261682 0.95327103 0.95327103 0.97196262
0.96261682 0.97196262 0.94392523 0.97222222]
mean value: 0.9579698857736241
key: test_roc_auc
value: [0.83333333 0.79166667 0.66666667 0.70833333 0.70833333 0.66666667
0.70833333 0.79166667 0.95833333 0.78030303]
mean value: 0.7613636363636364
key: train_roc_auc
value: [0.92099342 0.93959848 0.95816026 0.94422811 0.94422811 0.94894427
0.948901 0.94431464 0.93029595 0.9537037 ]
mean value: 0.9433367947386639
key: test_jcc
value: [0.69230769 0.61538462 0.55555556 0.5625 0.5625 0.46666667
0.5625 0.66666667 0.92307692 0.61538462]
mean value: 0.6222542735042735
key: train_jcc
value: [0.85470085 0.88695652 0.91964286 0.89473684 0.89473684 0.90434783
0.90350877 0.89655172 0.87068966 0.91304348]
mean value: 0.8938915373381364
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.22778225 0.23624635 0.22904706 0.23772502 0.23651695 0.23989558
0.2544477 0.28176785 0.23043919 0.23300552]
mean value: 0.24068734645843506
key: score_time
value: [0.02331424 0.02260542 0.02097154 0.02278924 0.02169299 0.02110124
0.02430058 0.02264762 0.02297473 0.02027822]
mean value: 0.022267580032348633
key: test_mcc
value: [0.58536941 0.66666667 0.64168895 0.41812101 0.60246408 0.53033009
0.58536941 0.6761234 0.75261781 0.65151515]
mean value: 0.6110265959870853
key: train_mcc
value: [0.74603309 0.78889274 0.7802162 0.75049973 0.76032494 0.77022946
0.77897523 0.76032494 0.76032494 0.77992042]
mean value: 0.767574167245235
key: test_accuracy
value: [0.79166667 0.83333333 0.79166667 0.70833333 0.79166667 0.75
0.79166667 0.83333333 0.875 0.82608696]
mean value: 0.7992753623188406
key: train_accuracy
value: [0.86976744 0.89302326 0.88837209 0.8744186 0.87906977 0.88372093
0.88837209 0.87906977 0.87906977 0.88888889]
mean value: 0.8823772609819122
key: test_fscore
value: [0.7826087 0.83333333 0.82758621 0.72 0.81481481 0.7
0.8 0.84615385 0.88 0.81818182]
mean value: 0.8022678715032538
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.87719298 0.89686099 0.89285714 0.87782805 0.88288288 0.88789238
0.89189189 0.88288288 0.88288288 0.89285714]
mean value: 0.8866029226238309
key: test_precision
value: [0.81818182 0.83333333 0.70588235 0.69230769 0.73333333 0.875
0.76923077 0.78571429 0.84615385 0.81818182]
mean value: 0.7877319249378073
key: train_precision
value: [0.82644628 0.86206897 0.85470085 0.85087719 0.85217391 0.85344828
0.86086957 0.85217391 0.85217391 0.86206897]
mean value: 0.8527001839919424
key: test_recall
value: [0.75 0.83333333 1. 0.75 0.91666667 0.58333333
0.83333333 0.91666667 0.91666667 0.81818182]
mean value: 0.8318181818181818
key: train_recall
value: [0.93457944 0.93457944 0.93457944 0.90654206 0.91588785 0.92523364
0.92523364 0.91588785 0.91588785 0.92592593]
mean value: 0.9234337140879196
key: test_roc_auc
value: [0.79166667 0.83333333 0.79166667 0.70833333 0.79166667 0.75
0.79166667 0.83333333 0.875 0.82575758]
mean value: 0.7992424242424243
key: train_roc_auc
value: [0.8700675 0.89321565 0.88858602 0.87456732 0.87924022 0.88391312
0.88854275 0.87924022 0.87924022 0.88888889]
mean value: 0.8825501903772932
key: test_jcc
value: [0.64285714 0.71428571 0.70588235 0.5625 0.6875 0.53846154
0.66666667 0.73333333 0.78571429 0.69230769]
mean value: 0.672950872656755
key: train_jcc
value: [0.78125 0.81300813 0.80645161 0.78225806 0.79032258 0.7983871
0.80487805 0.79032258 0.79032258 0.80645161]
mean value: 0.7963652307894047
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03132892 0.03188396 0.03114724 0.02774787 0.0322659 0.0272336
0.02989817 0.0314157 0.03231025 0.03273916]
mean value: 0.030797076225280762
key: score_time
value: [0.01346755 0.01181531 0.01179171 0.01184106 0.01183009 0.01181269
0.01179075 0.01178718 0.01394463 0.01460361]
mean value: 0.012468457221984863
key: test_mcc
value: [0.58536941 0.6761234 0.64168895 0.6761234 0.60246408 0.60246408
0.5 0.58536941 0.83333333 0.58536941]
mean value: 0.6288305461987017
key: train_mcc
value: [0.75261781 0.82495863 0.79684302 0.78788184 0.82495863 0.77898084
0.81537425 0.79973188 0.77898084 0.80642024]
mean value: 0.7966747979587403
key: test_accuracy
value: [0.79166667 0.83333333 0.79166667 0.83333333 0.79166667 0.79166667
0.75 0.79166667 0.91666667 0.79166667]
mean value: 0.8083333333333333
key: train_accuracy
value: [0.875 0.91203704 0.89814815 0.89351852 0.91203704 0.88888889
0.90740741 0.89814815 0.88888889 0.90277778]
mean value: 0.8976851851851851
key: test_fscore
value: [0.7826087 0.81818182 0.82758621 0.84615385 0.81481481 0.76190476
0.75 0.8 0.91666667 0.7826087 ]
mean value: 0.8100525505922808
key: train_fscore
value: [0.88 0.91402715 0.9 0.8959276 0.91402715 0.89189189
0.90909091 0.90265487 0.89189189 0.90497738]
mean value: 0.9004488836149429
key: test_precision
value: [0.81818182 0.9 0.70588235 0.78571429 0.73333333 0.88888889
0.75 0.76923077 0.91666667 0.81818182]
mean value: 0.8086079933138757
key: train_precision
value: [0.84615385 0.89380531 0.88392857 0.87610619 0.89380531 0.86842105
0.89285714 0.86440678 0.86842105 0.88495575]
mean value: 0.8772861011735417
key: test_recall
value: [0.75 0.75 1. 0.91666667 0.91666667 0.66666667
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.825
key: train_recall
value: [0.91666667 0.93518519 0.91666667 0.91666667 0.93518519 0.91666667
0.92592593 0.94444444 0.91666667 0.92592593]
mean value: 0.925
key: test_roc_auc
value: [0.79166667 0.83333333 0.79166667 0.83333333 0.79166667 0.79166667
0.75 0.79166667 0.91666667 0.79166667]
mean value: 0.8083333333333333
key: train_roc_auc
value: [0.875 0.91203704 0.89814815 0.89351852 0.91203704 0.88888889
0.90740741 0.89814815 0.88888889 0.90277778]
mean value: 0.8976851851851851
key: test_jcc
value: [0.64285714 0.69230769 0.70588235 0.73333333 0.6875 0.61538462
0.6 0.66666667 0.84615385 0.64285714]
mean value: 0.6832942792501616
key: train_jcc
value: [0.78571429 0.84166667 0.81818182 0.81147541 0.84166667 0.80487805
0.83333333 0.82258065 0.80487805 0.82644628]
mean value: 0.8190821204112838
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.87871289 0.72661591 0.72426248 0.82515478 0.74664068 0.73621082
1.05823541 0.73661423 0.84614754 0.7338798 ]
mean value: 0.8012474536895752
key: score_time
value: [0.01202941 0.01206493 0.01203799 0.01216793 0.01202607 0.0142746
0.01201224 0.01200318 0.0119803 0.01208901]
mean value: 0.012268567085266113
key: test_mcc
value: [0.75261781 0.6761234 0.64168895 0.6761234 0.53033009 0.50709255
0.5 0.58536941 0.6761234 0.66666667]
mean value: 0.621213568067322
key: train_mcc
value: [0.74393663 0.77898084 0.77120096 0.82495863 0.74188651 0.94444444
0.77992042 0.77120096 0.75158034 0.74188651]
mean value: 0.7849996266114585
key: test_accuracy
value: [0.875 0.83333333 0.79166667 0.83333333 0.75 0.75
0.75 0.79166667 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_accuracy
value: [0.87037037 0.88888889 0.88425926 0.91203704 0.87037037 0.97222222
0.88888889 0.88425926 0.875 0.87037037]
mean value: 0.8916666666666666
key: test_fscore
value: [0.88 0.81818182 0.82758621 0.84615385 0.78571429 0.72727273
0.75 0.8 0.84615385 0.83333333]
mean value: 0.8114396063706408
key: train_fscore
value: [0.87610619 0.89189189 0.88888889 0.91402715 0.87387387 0.97222222
0.89285714 0.88888889 0.87892377 0.87387387]
mean value: 0.8951553893324458
key: test_precision
value: [0.84615385 0.9 0.70588235 0.78571429 0.6875 0.8
0.75 0.76923077 0.78571429 0.83333333]
mean value: 0.7863528873087697
key: train_precision
value: [0.83898305 0.86842105 0.85470085 0.89380531 0.85087719 0.97222222
0.86206897 0.85470085 0.85217391 0.85087719]
mean value: 0.8698830609363113
key: test_recall
value: [0.91666667 0.75 1. 0.91666667 0.91666667 0.66666667
0.75 0.83333333 0.91666667 0.83333333]
mean value: 0.85
key: train_recall
value: [0.91666667 0.91666667 0.92592593 0.93518519 0.89814815 0.97222222
0.92592593 0.92592593 0.90740741 0.89814815]
mean value: 0.9222222222222223
key: test_roc_auc
value: [0.875 0.83333333 0.79166667 0.83333333 0.75 0.75
0.75 0.79166667 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_roc_auc
value: [0.87037037 0.88888889 0.88425926 0.91203704 0.87037037 0.97222222
0.88888889 0.88425926 0.875 0.87037037]
mean value: 0.8916666666666666
key: test_jcc
value: [0.78571429 0.69230769 0.70588235 0.73333333 0.64705882 0.57142857
0.6 0.66666667 0.73333333 0.71428571]
mean value: 0.6850010773540185
key: train_jcc
value: [0.77952756 0.80487805 0.8 0.84166667 0.776 0.94594595
0.80645161 0.8 0.784 0.776 ]
mean value: 0.8114469833351444
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01747465 0.01148367 0.00916314 0.00904512 0.00902748 0.00901794
0.00899577 0.00941372 0.00913596 0.0089817 ]
mean value: 0.010173916816711426
key: score_time
value: [0.01298499 0.00921249 0.00894976 0.00869322 0.00875592 0.00881815
0.00888896 0.00878215 0.00871611 0.00874591]
mean value: 0.009254765510559083
key: test_mcc
value: [ 0.38490018 0.43033148 -0.2236068 0.35355339 0.64168895 0.50709255
0.27500955 0.70710678 0.60246408 0.6761234 ]
mean value: 0.4354663566126372
key: train_mcc
value: [0.50639215 0.4859637 0.54289671 0.5002895 0.49104638 0.51437268
0.52235132 0.48653363 0.49041703 0.51478398]
mean value: 0.5055047079924628
key: test_accuracy
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.75
0.625 0.83333333 0.79166667 0.83333333]
mean value: 0.7083333333333334
key: train_accuracy
value: [0.73611111 0.71296296 0.75925926 0.74537037 0.73148148 0.74074074
0.74537037 0.72685185 0.72685185 0.73611111]
mean value: 0.7361111111111112
key: test_fscore
value: [0.73333333 0.74074074 0.5625 0.71428571 0.82758621 0.76923077
0.68965517 0.85714286 0.81481481 0.84615385]
mean value: 0.755544345501242
key: train_fscore
value: [0.77647059 0.76865672 0.79032258 0.76793249 0.76984127 0.77952756
0.7826087 0.76862745 0.77042802 0.77992278]
mean value: 0.7754338145765779
key: test_precision
value: [0.61111111 0.66666667 0.45 0.625 0.70588235 0.71428571
0.58823529 0.75 0.73333333 0.78571429]
mean value: 0.6630228758169935
key: train_precision
value: [0.67346939 0.64375 0.7 0.70542636 0.67361111 0.67808219
0.68275862 0.66666667 0.66442953 0.66887417]
mean value: 0.6757068036979277
key: test_recall
value: [0.91666667 0.83333333 0.75 0.83333333 1. 0.83333333
0.83333333 1. 0.91666667 0.91666667]
mean value: 0.8833333333333333
key: train_recall
value: [0.91666667 0.9537037 0.90740741 0.84259259 0.89814815 0.91666667
0.91666667 0.90740741 0.91666667 0.93518519]
mean value: 0.9111111111111111
key: test_roc_auc
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.75
0.625 0.83333333 0.79166667 0.83333333]
mean value: 0.7083333333333334
key: train_roc_auc
value: [0.73611111 0.71296296 0.75925926 0.74537037 0.73148148 0.74074074
0.74537037 0.72685185 0.72685185 0.73611111]
mean value: 0.7361111111111112
key: test_jcc
value: [0.57894737 0.58823529 0.39130435 0.55555556 0.70588235 0.625
0.52631579 0.75 0.6875 0.73333333]
mean value: 0.6142074041668536
key: train_jcc
value: [0.63461538 0.62424242 0.65333333 0.62328767 0.62580645 0.63870968
0.64285714 0.62420382 0.62658228 0.63924051]
mean value: 0.6332878691779598
MCC on Blind test: 0.21
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00945187 0.00936913 0.00931382 0.00924897 0.00935626 0.0092473
0.00924015 0.00930476 0.00929523 0.00913787]
mean value: 0.009296536445617676
key: score_time
value: [0.00871253 0.00882363 0.00878596 0.00875378 0.00881147 0.00875854
0.00878882 0.00885057 0.00869274 0.00871849]
mean value: 0.008769655227661132
key: test_mcc
value: [0.58536941 0.35355339 0. 0.70710678 0.33333333 0.58536941
0.25819889 0.58536941 0.84515425 0.83333333]
mean value: 0.5086788203937056
key: train_mcc
value: [0.61205637 0.62060985 0.65743559 0.64993368 0.66222239 0.62039697
0.66712438 0.64695398 0.62253572 0.58760578]
mean value: 0.6346874717823057
key: test_accuracy
value: [0.79166667 0.66666667 0.5 0.83333333 0.66666667 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.75
key: train_accuracy
value: [0.80555556 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.83333333 0.81944444 0.81018519 0.79166667]
mean value: 0.8162037037037038
key: test_fscore
value: [0.7826087 0.6 0.57142857 0.85714286 0.66666667 0.8
0.66666667 0.7826087 0.92307692 0.91666667]
mean value: 0.7566865742952699
key: train_fscore
value: [0.81081081 0.81278539 0.82949309 0.83035714 0.83842795 0.81105991
0.83636364 0.83261803 0.81777778 0.80349345]
mean value: 0.8223187174459913
key: test_precision
value: [0.81818182 0.75 0.5 0.75 0.66666667 0.76923077
0.6 0.81818182 0.85714286 0.91666667]
mean value: 0.7446070596070596
key: train_precision
value: [0.78947368 0.8018018 0.82568807 0.80172414 0.79338843 0.80733945
0.82142857 0.776 0.78632479 0.76033058]
mean value: 0.7963499512896963
key: test_recall
value: [0.75 0.5 0.66666667 1. 0.66666667 0.83333333
0.75 0.75 1. 0.91666667]
mean value: 0.7833333333333333
key: train_recall
value: [0.83333333 0.82407407 0.83333333 0.86111111 0.88888889 0.81481481
0.85185185 0.89814815 0.85185185 0.85185185]
mean value: 0.850925925925926
key: test_roc_auc
value: [0.79166667 0.66666667 0.5 0.83333333 0.66666667 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.75
key: train_roc_auc
value: [0.80555556 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.83333333 0.81944444 0.81018519 0.79166667]
mean value: 0.8162037037037037
key: test_jcc
value: [0.64285714 0.42857143 0.4 0.75 0.5 0.66666667
0.5 0.64285714 0.85714286 0.84615385]
mean value: 0.6234249084249084
key: train_jcc
value: [0.68181818 0.68461538 0.70866142 0.70992366 0.72180451 0.68217054
0.71875 0.71323529 0.69172932 0.67153285]
mean value: 0.6984241165933639
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00914621 0.00876808 0.00858617 0.00959158 0.00878024 0.00912333
0.0090847 0.00912189 0.00884724 0.00896358]
mean value: 0.00900130271911621
key: score_time
value: [0.01012707 0.01494837 0.00995064 0.01463628 0.01452112 0.01011086
0.01094484 0.01011419 0.0106566 0.01070809]
mean value: 0.011671805381774902
key: test_mcc
value: [0.58536941 0.58536941 0. 0.33333333 0.3380617 0.83333333
0.0860663 0.25819889 0.6761234 0.50709255]
mean value: 0.42029483255174716
key: train_mcc
value: [0.62491409 0.59763515 0.62491409 0.65970203 0.59628479 0.57034259
0.636655 0.59763515 0.63355259 0.60625994]
mean value: 0.6147895420546894
key: test_accuracy
value: [0.79166667 0.79166667 0.5 0.66666667 0.66666667 0.91666667
0.54166667 0.625 0.83333333 0.75 ]
mean value: 0.7083333333333333
key: train_accuracy
value: [0.81018519 0.7962963 0.81018519 0.8287037 0.7962963 0.78240741
0.81481481 0.7962963 0.81481481 0.80092593]
mean value: 0.8050925925925926
key: test_fscore
value: [0.7826087 0.8 0.57142857 0.66666667 0.69230769 0.91666667
0.59259259 0.66666667 0.84615385 0.72727273]
mean value: 0.7262364125407603
key: train_fscore
value: [0.8209607 0.80869565 0.8209607 0.83555556 0.80701754 0.7965368
0.82758621 0.80869565 0.8245614 0.81222707]
mean value: 0.8162797282320872
key: test_precision
value: [0.81818182 0.76923077 0.5 0.66666667 0.64285714 0.91666667
0.53333333 0.6 0.78571429 0.8 ]
mean value: 0.7032650682650683
key: train_precision
value: [0.7768595 0.76229508 0.7768595 0.8034188 0.76666667 0.74796748
0.77419355 0.76229508 0.78333333 0.76859504]
mean value: 0.7722484045001899
key: test_recall
value: [0.75 0.83333333 0.66666667 0.66666667 0.75 0.91666667
0.66666667 0.75 0.91666667 0.66666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.87037037 0.86111111 0.87037037 0.87037037 0.85185185 0.85185185
0.88888889 0.86111111 0.87037037 0.86111111]
mean value: 0.8657407407407407
key: test_roc_auc
value: [0.79166667 0.79166667 0.5 0.66666667 0.66666667 0.91666667
0.54166667 0.625 0.83333333 0.75 ]
mean value: 0.7083333333333334
key: train_roc_auc
value: [0.81018519 0.7962963 0.81018519 0.8287037 0.7962963 0.78240741
0.81481481 0.7962963 0.81481481 0.80092593]
mean value: 0.8050925925925926
key: test_jcc
value: [0.64285714 0.66666667 0.4 0.5 0.52941176 0.84615385
0.42105263 0.5 0.73333333 0.57142857]
mean value: 0.581090395672439
key: train_jcc
value: [0.6962963 0.67883212 0.6962963 0.71755725 0.67647059 0.6618705
0.70588235 0.67883212 0.70149254 0.68382353]
mean value: 0.6897353589576423
MCC on Blind test: 0.06
Accuracy on Blind test: 0.55
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01296568 0.01241398 0.01232123 0.01224113 0.01185298 0.01195812
0.01187444 0.01185417 0.0118258 0.01191759]
mean value: 0.012122511863708496
key: score_time
value: [0.00993228 0.0095787 0.01052666 0.00957179 0.00952911 0.00951576
0.00952315 0.00964808 0.00979733 0.00964403]
mean value: 0.009726691246032714
key: test_mcc
value: [0.75261781 0.66666667 0.35355339 0.50709255 0.43033148 0.58536941
0.50709255 0.6761234 0.58536941 0.75261781]
mean value: 0.5816834481651599
key: train_mcc
value: [0.78439613 0.77822 0.79115136 0.77992042 0.83562902 0.78439613
0.80307223 0.74704394 0.78262379 0.7741473 ]
mean value: 0.7860600330933626
key: test_accuracy
value: [0.875 0.83333333 0.66666667 0.75 0.70833333 0.79166667
0.75 0.83333333 0.79166667 0.875 ]
mean value: 0.7875
key: train_accuracy
value: [0.88888889 0.88425926 0.89351852 0.88888889 0.91666667 0.88888889
0.89814815 0.87037037 0.88888889 0.88425926]
mean value: 0.8902777777777777
key: test_fscore
value: [0.88 0.83333333 0.71428571 0.76923077 0.74074074 0.7826087
0.76923077 0.84615385 0.8 0.88 ]
mean value: 0.8015583868627346
key: train_fscore
value: [0.89565217 0.89270386 0.89867841 0.89285714 0.91964286 0.89565217
0.90434783 0.87826087 0.89473684 0.89082969]
mean value: 0.8963361856664529
key: test_precision
value: [0.84615385 0.83333333 0.625 0.71428571 0.66666667 0.81818182
0.71428571 0.78571429 0.76923077 0.84615385]
mean value: 0.7619005994005994
key: train_precision
value: [0.8442623 0.832 0.85714286 0.86206897 0.88793103 0.8442623
0.85245902 0.82786885 0.85 0.84297521]
mean value: 0.8500970522770821
key: test_recall
value: [0.91666667 0.83333333 0.83333333 0.83333333 0.83333333 0.75
0.83333333 0.91666667 0.83333333 0.91666667]
mean value: 0.85
key: train_recall
value: [0.9537037 0.96296296 0.94444444 0.92592593 0.9537037 0.9537037
0.96296296 0.93518519 0.94444444 0.94444444]
mean value: 0.9481481481481482
key: test_roc_auc
value: [0.875 0.83333333 0.66666667 0.75 0.70833333 0.79166667
0.75 0.83333333 0.79166667 0.875 ]
mean value: 0.7875000000000001
key: train_roc_auc
value: [0.88888889 0.88425926 0.89351852 0.88888889 0.91666667 0.88888889
0.89814815 0.87037037 0.88888889 0.88425926]
mean value: 0.8902777777777777
key: test_jcc
value: [0.78571429 0.71428571 0.55555556 0.625 0.58823529 0.64285714
0.625 0.73333333 0.66666667 0.78571429]
mean value: 0.6722362278244631
key: train_jcc
value: [0.81102362 0.80620155 0.816 0.80645161 0.85123967 0.81102362
0.82539683 0.78294574 0.80952381 0.80314961]
mean value: 0.8122956054460755
MCC on Blind test: 0.3
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.55233645 0.88785172 0.81239462 1.05234766 1.17555523 0.62090945
1.19066715 0.98777461 1.14500237 0.67176127]
mean value: 0.9096600532531738
key: score_time
value: [0.01228666 0.01233935 0.01228571 0.01497912 0.01464343 0.01230478
0.01451349 0.01451325 0.01454258 0.01224256]
mean value: 0.013465094566345214
key: test_mcc
value: [0.58536941 0.45834925 0.45834925 0.75261781 0.58536941 0.60246408
0.5 0.5 0.91986621 0.50709255]
mean value: 0.586947795996614
key: train_mcc
value: [0.87966734 0.94460643 0.91670596 0.93618901 0.98164982 0.88148312
0.95374459 0.93618901 0.96296296 0.89818665]
mean value: 0.9291384891935454
key: test_accuracy
value: [0.79166667 0.70833333 0.70833333 0.875 0.79166667 0.79166667
0.75 0.75 0.95833333 0.75 ]
mean value: 0.7875
key: train_accuracy
value: [0.93981481 0.97222222 0.95833333 0.96759259 0.99074074 0.93981481
0.97685185 0.96759259 0.98148148 0.94907407]
mean value: 0.9643518518518518
key: test_fscore
value: [0.7826087 0.63157895 0.75862069 0.86956522 0.7826087 0.76190476
0.75 0.75 0.95652174 0.72727273]
mean value: 0.7770681474027169
key: train_fscore
value: [0.93953488 0.97247706 0.95852535 0.96682464 0.99065421 0.93779904
0.97674419 0.96682464 0.98148148 0.94883721]
mean value: 0.9639702708162756
key: test_precision
value: [0.81818182 0.85714286 0.64705882 0.90909091 0.81818182 0.88888889
0.75 0.75 1. 0.8 ]
mean value: 0.8238545115015703
key: train_precision
value: [0.94392523 0.96363636 0.95412844 0.99029126 1. 0.97029703
0.98130841 0.99029126 0.98148148 0.95327103]
mean value: 0.9728630512356828
key: test_recall
value: [0.75 0.5 0.91666667 0.83333333 0.75 0.66666667
0.75 0.75 0.91666667 0.66666667]
mean value: 0.75
key: train_recall
value: [0.93518519 0.98148148 0.96296296 0.94444444 0.98148148 0.90740741
0.97222222 0.94444444 0.98148148 0.94444444]
mean value: 0.9555555555555556
key: test_roc_auc
value: [0.79166667 0.70833333 0.70833333 0.875 0.79166667 0.79166667
0.75 0.75 0.95833333 0.75 ]
mean value: 0.7875
key: train_roc_auc
value: [0.93981481 0.97222222 0.95833333 0.96759259 0.99074074 0.93981481
0.97685185 0.96759259 0.98148148 0.94907407]
mean value: 0.9643518518518518
key: test_jcc
value: [0.64285714 0.46153846 0.61111111 0.76923077 0.64285714 0.61538462
0.6 0.6 0.91666667 0.57142857]
mean value: 0.6431074481074481
key: train_jcc
value: [0.88596491 0.94642857 0.92035398 0.93577982 0.98148148 0.88288288
0.95454545 0.93577982 0.96363636 0.90265487]
mean value: 0.9309508148840501
MCC on Blind test: 0.34
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02154303 0.01629329 0.01646686 0.01569819 0.01544809 0.0164423
0.01606488 0.01601863 0.01652622 0.01666164]
mean value: 0.016716313362121583
key: score_time
value: [0.01181388 0.00914931 0.00873971 0.00877476 0.00904584 0.00866413
0.00874805 0.0089674 0.00875044 0.00878358]
mean value: 0.009143710136413574
key: test_mcc
value: [0.84515425 0.53033009 0.1767767 0.50709255 0.1767767 0.41812101
0.5 0.2508726 0.50709255 0.58536941]
mean value: 0.4497585851896557
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.75 0.58333333 0.75 0.58333333 0.70833333
0.75 0.625 0.75 0.79166667]
mean value: 0.7208333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.7 0.64285714 0.76923077 0.64285714 0.69565217
0.75 0.64 0.76923077 0.7826087 ]
mean value: 0.7315513616817965
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.875 0.5625 0.71428571 0.5625 0.72727273
0.75 0.61538462 0.71428571 0.81818182]
mean value: 0.7196553446553446
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.58333333 0.75 0.83333333 0.75 0.66666667
0.75 0.66666667 0.83333333 0.75 ]
mean value: 0.7583333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.75 0.58333333 0.75 0.58333333 0.70833333
0.75 0.625 0.75 0.79166667]
mean value: 0.7208333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.53846154 0.47368421 0.625 0.47368421 0.53333333
0.6 0.47058824 0.625 0.64285714]
mean value: 0.5839751528141621
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.67
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.1001339 0.10060334 0.09702349 0.09846067 0.09874797 0.09993148
0.09776998 0.09760618 0.09800601 0.09977889]
mean value: 0.09880619049072266
key: score_time
value: [0.0177145 0.0176568 0.01760507 0.01874804 0.01777411 0.01774359
0.01760697 0.01760936 0.01760936 0.01758695]
mean value: 0.01776547431945801
key: test_mcc
value: [0.66666667 0.43033148 0.50709255 0.6761234 0.41812101 0.75261781
0.41812101 0.43033148 0.84515425 0.3380617 ]
mean value: 0.5482621364743856
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.70833333 0.75 0.83333333 0.70833333 0.875
0.70833333 0.70833333 0.91666667 0.66666667]
mean value: 0.7708333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.66666667 0.76923077 0.84615385 0.72 0.86956522
0.72 0.74074074 0.92307692 0.69230769]
mean value: 0.7781075188901275
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.77777778 0.71428571 0.78571429 0.69230769 0.90909091
0.69230769 0.66666667 0.85714286 0.64285714]
mean value: 0.7571484071484071
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.58333333 0.83333333 0.91666667 0.75 0.83333333
0.75 0.83333333 1. 0.75 ]
mean value: 0.8083333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.70833333 0.75 0.83333333 0.70833333 0.875
0.70833333 0.70833333 0.91666667 0.66666667]
mean value: 0.7708333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.5 0.625 0.73333333 0.5625 0.76923077
0.5625 0.58823529 0.85714286 0.52941176]
mean value: 0.6441639732816203
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.62
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01113224 0.01055098 0.00930738 0.00918555 0.00923061 0.00927019
0.00917935 0.0091207 0.00938368 0.00915384]
mean value: 0.009551453590393066
key: score_time
value: [0.00952339 0.00961161 0.00871277 0.00885367 0.00872374 0.00861049
0.00865412 0.00864911 0.00867414 0.00875235]
mean value: 0.008876538276672364
key: test_mcc
value: [0.58536941 0.45834925 0.25819889 0.16903085 0. 0.41812101
0.0836242 0.1767767 0.70710678 0.43033148]
mean value: 0.3286908561611308
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.70833333 0.625 0.58333333 0.5 0.70833333
0.54166667 0.58333333 0.83333333 0.70833333]
mean value: 0.6583333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.63157895 0.57142857 0.54545455 0.5 0.72
0.56 0.64285714 0.8 0.66666667]
mean value: 0.6420594569427521
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.85714286 0.66666667 0.6 0.5 0.69230769
0.53846154 0.5625 1. 0.77777778]
mean value: 0.7013038350538351
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.5 0.5 0.5 0.75
0.58333333 0.75 0.66666667 0.58333333]
mean value: 0.6083333333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.70833333 0.625 0.58333333 0.5 0.70833333
0.54166667 0.58333333 0.83333333 0.70833333]
mean value: 0.6583333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.46153846 0.4 0.375 0.33333333 0.5625
0.38888889 0.47368421 0.66666667 0.5 ]
mean value: 0.4804468703810809
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.64
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.32348752 1.30932379 1.30970073 1.32442284 1.31778431 1.31542468
1.30667329 1.300565 1.34182906 1.36433816]
mean value: 1.3213549375534057
key: score_time
value: [0.09575081 0.09006882 0.09617686 0.09046817 0.09082484 0.0902133
0.09001493 0.09058738 0.09592247 0.09773922]
mean value: 0.09277667999267578
key: test_mcc
value: [0.75261781 0.64168895 0.60246408 0.75261781 0.75261781 0.64168895
0.33333333 0.43033148 0.75261781 0.41812101]
mean value: 0.6078099029190546
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.79166667 0.79166667 0.875 0.875 0.79166667
0.66666667 0.70833333 0.875 0.70833333]
mean value: 0.7958333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.73684211 0.81481481 0.88 0.88 0.73684211
0.66666667 0.74074074 0.88 0.69565217]
mean value: 0.7901123824052886
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 1. 0.73333333 0.84615385 0.84615385 1.
0.66666667 0.66666667 0.84615385 0.72727273]
mean value: 0.8241491841491841
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.58333333 0.91666667 0.91666667 0.91666667 0.58333333
0.66666667 0.83333333 0.91666667 0.66666667]
mean value: 0.7833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.79166667 0.79166667 0.875 0.875 0.79166667
0.66666667 0.70833333 0.875 0.70833333]
mean value: 0.7958333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.58333333 0.6875 0.78571429 0.78571429 0.58333333
0.5 0.58823529 0.78571429 0.53333333]
mean value: 0.6602108920491273
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.89206052 0.91959095 0.96954322 0.89371514 0.94958425 0.91306782
0.93570518 0.96002221 0.9497087 0.88737702]
mean value: 0.9270375013351441
key: score_time
value: [0.20388103 0.21231103 0.21701646 0.13943863 0.23853755 0.251302
0.21599674 0.22050405 0.18904161 0.17836046]
mean value: 0.20663895606994628
key: test_mcc
value: [0.75261781 0.53033009 0.70710678 0.75261781 0.6761234 0.64168895
0.33333333 0.50709255 0.75261781 0.41812101]
mean value: 0.6071649536972153
key: train_mcc
value: [0.90756304 0.89849486 0.89818665 0.90803041 0.93554619 0.88904134
0.90756304 0.92608473 0.90803041 0.92608473]
mean value: 0.910462541219685
key: test_accuracy
value: [0.875 0.75 0.83333333 0.875 0.83333333 0.79166667
0.66666667 0.75 0.875 0.70833333]
mean value: 0.7958333333333334
key: train_accuracy
value: [0.9537037 0.94907407 0.94907407 0.9537037 0.96759259 0.94444444
0.9537037 0.96296296 0.9537037 0.96296296]
mean value: 0.9550925925925926
key: test_fscore
value: [0.86956522 0.7 0.85714286 0.88 0.84615385 0.73684211
0.66666667 0.76923077 0.86956522 0.69565217]
mean value: 0.7890818853152949
key: train_fscore
value: [0.95412844 0.94977169 0.94930876 0.95454545 0.96803653 0.94495413
0.95412844 0.96330275 0.95454545 0.96330275]
mean value: 0.9556024397790828
key: test_precision
value: [0.90909091 0.875 0.75 0.84615385 0.78571429 1.
0.66666667 0.71428571 0.90909091 0.72727273]
mean value: 0.8183275058275058
key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [0.94545455 0.93693694 0.94495413 0.9375 0.95495495 0.93636364
0.94545455 0.95454545 0.9375 0.95454545]
mean value: 0.9448209656695895
key: test_recall
value: [0.83333333 0.58333333 1. 0.91666667 0.91666667 0.58333333
0.66666667 0.83333333 0.83333333 0.66666667]
mean value: 0.7833333333333333
key: train_recall
value: [0.96296296 0.96296296 0.9537037 0.97222222 0.98148148 0.9537037
0.96296296 0.97222222 0.97222222 0.97222222]
mean value: 0.9666666666666667
key: test_roc_auc
value: [0.875 0.75 0.83333333 0.875 0.83333333 0.79166667
0.66666667 0.75 0.875 0.70833333]
mean value: 0.7958333333333334
key: train_roc_auc
value: [0.9537037 0.94907407 0.94907407 0.9537037 0.96759259 0.94444444
0.9537037 0.96296296 0.9537037 0.96296296]
mean value: 0.9550925925925926
key: test_jcc
value: [0.76923077 0.53846154 0.75 0.78571429 0.73333333 0.58333333
0.5 0.625 0.76923077 0.53333333]
mean value: 0.6587637362637363
key: train_jcc
value: [0.9122807 0.90434783 0.90350877 0.91304348 0.9380531 0.89565217
0.9122807 0.92920354 0.91304348 0.92920354]
mean value: 0.9150617308951486
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02310514 0.0093317 0.00927734 0.00936031 0.01038909 0.01009369
0.01033616 0.01032877 0.01018953 0.00961852]
mean value: 0.01120302677154541
key: score_time
value: [0.01308703 0.00966859 0.0088737 0.0088706 0.00949836 0.00903583
0.00962281 0.00909829 0.00949168 0.00966215]
mean value: 0.00969090461730957
key: test_mcc
value: [0.58536941 0.35355339 0. 0.70710678 0.33333333 0.58536941
0.25819889 0.58536941 0.84515425 0.83333333]
mean value: 0.5086788203937056
key: train_mcc
value: [0.61205637 0.62060985 0.65743559 0.64993368 0.66222239 0.62039697
0.66712438 0.64695398 0.62253572 0.58760578]
mean value: 0.6346874717823057
key: test_accuracy
value: [0.79166667 0.66666667 0.5 0.83333333 0.66666667 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.75
key: train_accuracy
value: [0.80555556 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.83333333 0.81944444 0.81018519 0.79166667]
mean value: 0.8162037037037038
key: test_fscore
value: [0.7826087 0.6 0.57142857 0.85714286 0.66666667 0.8
0.66666667 0.7826087 0.92307692 0.91666667]
mean value: 0.7566865742952699
key: train_fscore
value: [0.81081081 0.81278539 0.82949309 0.83035714 0.83842795 0.81105991
0.83636364 0.83261803 0.81777778 0.80349345]
mean value: 0.8223187174459913
key: test_precision
value: [0.81818182 0.75 0.5 0.75 0.66666667 0.76923077
0.6 0.81818182 0.85714286 0.91666667]
mean value: 0.7446070596070596
key: train_precision
value: [0.78947368 0.8018018 0.82568807 0.80172414 0.79338843 0.80733945
0.82142857 0.776 0.78632479 0.76033058]
mean value: 0.7963499512896963
key: test_recall
value: [0.75 0.5 0.66666667 1. 0.66666667 0.83333333
0.75 0.75 1. 0.91666667]
mean value: 0.7833333333333333
key: train_recall
value: [0.83333333 0.82407407 0.83333333 0.86111111 0.88888889 0.81481481
0.85185185 0.89814815 0.85185185 0.85185185]
mean value: 0.850925925925926
key: test_roc_auc
value: [0.79166667 0.66666667 0.5 0.83333333 0.66666667 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.75
key: train_roc_auc
value: [0.80555556 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.83333333 0.81944444 0.81018519 0.79166667]
mean value: 0.8162037037037037
key: test_jcc
value: [0.64285714 0.42857143 0.4 0.75 0.5 0.66666667
0.5 0.64285714 0.85714286 0.84615385]
mean value: 0.6234249084249084
key: train_jcc
value: [0.68181818 0.68461538 0.70866142 0.70992366 0.72180451 0.68217054
0.71875 0.71323529 0.69172932 0.67153285]
mean value: 0.6984241165933639
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.11020756 0.05813694 0.06567621 0.06768131 0.06942606 0.06736112
0.06523728 0.06494617 0.06630445 0.07134366]
mean value: 0.07063207626342774
key: score_time
value: [0.01092362 0.01058197 0.01053071 0.01055431 0.01047707 0.01060176
0.01060224 0.01061916 0.01055455 0.01064134]
mean value: 0.010608673095703125
key: test_mcc
value: [0.83333333 0.38490018 0.57735027 0.6761234 0.58536941 0.53033009
0.58536941 0.2508726 0.75261781 0.33333333]
mean value: 0.5509599831007203
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.66666667 0.75 0.83333333 0.79166667 0.75
0.79166667 0.625 0.875 0.66666667]
mean value: 0.7666666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91666667 0.55555556 0.8 0.84615385 0.7826087 0.7
0.7826087 0.64 0.88 0.66666667]
mean value: 0.7570260126347083
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91666667 0.83333333 0.66666667 0.78571429 0.81818182 0.875
0.81818182 0.61538462 0.84615385 0.66666667]
mean value: 0.7841949716949717
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91666667 0.41666667 1. 0.91666667 0.75 0.58333333
0.75 0.66666667 0.91666667 0.66666667]
mean value: 0.7583333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.66666667 0.75 0.83333333 0.79166667 0.75
0.79166667 0.625 0.875 0.66666667]
mean value: 0.7666666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84615385 0.38461538 0.66666667 0.73333333 0.64285714 0.53846154
0.64285714 0.47058824 0.78571429 0.5 ]
mean value: 0.6211247575953458
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03139472 0.07590818 0.06847668 0.03701425 0.06345892 0.07317209
0.05685735 0.05946326 0.0596664 0.0610683 ]
mean value: 0.05864801406860352
key: score_time
value: [0.02168989 0.02412105 0.01233745 0.01214147 0.02543545 0.01350594
0.01780462 0.02308083 0.02142167 0.02297115]
mean value: 0.019450950622558593
key: test_mcc
value: [0.66666667 0.30779351 0.16903085 0.41812101 0.16666667 0.83333333
0.50709255 0.58536941 0.75261781 0.2508726 ]
mean value: 0.4657564400092044
key: train_mcc
value: [0.92608473 0.96296296 0.94444444 0.96296296 0.94460643 0.94444444
0.95374459 0.96312812 0.95374459 0.97259753]
mean value: 0.9528720801015835
key: test_accuracy
value: [0.83333333 0.625 0.58333333 0.70833333 0.58333333 0.91666667
0.75 0.79166667 0.875 0.625 ]
mean value: 0.7291666666666666
key: train_accuracy
value: [0.96296296 0.98148148 0.97222222 0.98148148 0.97222222 0.97222222
0.97685185 0.98148148 0.97685185 0.98611111]
mean value: 0.9763888888888889
key: test_fscore
value: [0.83333333 0.70967742 0.61538462 0.72 0.58333333 0.91666667
0.76923077 0.7826087 0.86956522 0.64 ]
mean value: 0.7439800050347034
key: train_fscore
value: [0.96330275 0.98148148 0.97222222 0.98148148 0.97247706 0.97222222
0.97674419 0.98165138 0.97695853 0.98630137]
mean value: 0.9764842681323106
key: test_precision
value: [0.83333333 0.57894737 0.57142857 0.69230769 0.58333333 0.91666667
0.71428571 0.81818182 0.90909091 0.61538462]
mean value: 0.7232960022433707
key: train_precision
value: [0.95454545 0.98148148 0.97222222 0.98148148 0.96363636 0.97222222
0.98130841 0.97272727 0.97247706 0.97297297]
mean value: 0.9725074946724608
key: test_recall
value: [0.83333333 0.91666667 0.66666667 0.75 0.58333333 0.91666667
0.83333333 0.75 0.83333333 0.66666667]
mean value: 0.775
key: train_recall
value: [0.97222222 0.98148148 0.97222222 0.98148148 0.98148148 0.97222222
0.97222222 0.99074074 0.98148148 1. ]
mean value: 0.9805555555555555
key: test_roc_auc
value: [0.83333333 0.625 0.58333333 0.70833333 0.58333333 0.91666667
0.75 0.79166667 0.875 0.625 ]
mean value: 0.7291666666666666
key: train_roc_auc
value: [0.96296296 0.98148148 0.97222222 0.98148148 0.97222222 0.97222222
0.97685185 0.98148148 0.97685185 0.98611111]
mean value: 0.9763888888888889
key: test_jcc
value: [0.71428571 0.55 0.44444444 0.5625 0.41176471 0.84615385
0.625 0.64285714 0.76923077 0.47058824]
mean value: 0.6036824858148387
key: train_jcc
value: [0.92920354 0.96363636 0.94594595 0.96363636 0.94642857 0.94594595
0.95454545 0.96396396 0.95495495 0.97297297]
mean value: 0.9541234076853546
MCC on Blind test: 0.42
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01248264 0.01069927 0.01052046 0.01032162 0.01016784 0.01024556
0.01046252 0.01028323 0.01026583 0.01017547]
mean value: 0.010562443733215332
key: score_time
value: [0.00969648 0.01000333 0.00978112 0.00962114 0.00963092 0.00957775
0.0096333 0.0095582 0.00949764 0.00952911]
mean value: 0.00965290069580078
key: test_mcc
value: [ 0.43033148 0.41812101 -0.0860663 0.45834925 0.60246408 0.41812101
0.50709255 0.58536941 0.41812101 0.66666667]
mean value: 0.44185701524397397
key: train_mcc
value: [0.48685383 0.48685383 0.49693566 0.49554356 0.47684381 0.47568087
0.47568087 0.46812868 0.49554356 0.49693566]
mean value: 0.48550003388620916
key: test_accuracy
value: [0.70833333 0.70833333 0.45833333 0.70833333 0.79166667 0.70833333
0.75 0.79166667 0.70833333 0.83333333]
mean value: 0.7166666666666667
key: train_accuracy
value: [0.74074074 0.74074074 0.74537037 0.74537037 0.73611111 0.73611111
0.73611111 0.73148148 0.74537037 0.74537037]
mean value: 0.7402777777777778
key: test_fscore
value: [0.74074074 0.72 0.51851852 0.75862069 0.81481481 0.72
0.76923077 0.8 0.69565217 0.83333333]
mean value: 0.7370911040206393
key: train_fscore
value: [0.75862069 0.75862069 0.7639485 0.76190476 0.75324675 0.7510917
0.7510917 0.75 0.76190476 0.7639485 ]
mean value: 0.7574378058188314
key: test_precision
value: [0.66666667 0.69230769 0.46666667 0.64705882 0.73333333 0.69230769
0.71428571 0.76923077 0.72727273 0.83333333]
mean value: 0.6942463418934007
key: train_precision
value: [0.70967742 0.70967742 0.712 0.71544715 0.70731707 0.7107438
0.7107438 0.7016129 0.71544715 0.712 ]
mean value: 0.710466672735509
key: test_recall
value: [0.83333333 0.75 0.58333333 0.91666667 0.91666667 0.75
0.83333333 0.83333333 0.66666667 0.83333333]
mean value: 0.7916666666666666
key: train_recall
value: [0.81481481 0.81481481 0.82407407 0.81481481 0.80555556 0.7962963
0.7962963 0.80555556 0.81481481 0.82407407]
mean value: 0.8111111111111111
key: test_roc_auc
value: [0.70833333 0.70833333 0.45833333 0.70833333 0.79166667 0.70833333
0.75 0.79166667 0.70833333 0.83333333]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.74074074 0.74074074 0.74537037 0.74537037 0.73611111 0.73611111
0.73611111 0.73148148 0.74537037 0.74537037]
mean value: 0.7402777777777779
key: test_jcc
value: [0.58823529 0.5625 0.35 0.61111111 0.6875 0.5625
0.625 0.66666667 0.53333333 0.71428571]
mean value: 0.5901132119514473
key: train_jcc
value: [0.61111111 0.61111111 0.61805556 0.61538462 0.60416667 0.6013986
0.6013986 0.6 0.61538462 0.61805556]
mean value: 0.6096066433566434
MCC on Blind test: 0.28
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01346827 0.01688552 0.01641726 0.01505566 0.01531744 0.0147779
0.0166719 0.01709533 0.01546025 0.0159111 ]
mean value: 0.01570606231689453
key: score_time
value: [0.00915766 0.0112474 0.01120234 0.01165509 0.01171255 0.01177716
0.01172066 0.01178312 0.01171827 0.01173067]
mean value: 0.011370491981506348
key: test_mcc
value: [0.6761234 0.53033009 0.33333333 0.57735027 0.53033009 0.45834925
0.53033009 0.58536941 0.37796447 0.58536941]
mean value: 0.5184849799508764
key: train_mcc
value: [0.74307085 0.87996919 0.78656204 0.64888568 0.79848995 0.77831178
0.74779086 0.80836728 0.60587838 0.79473968]
mean value: 0.7592065697400981
key: test_accuracy
value: [0.83333333 0.75 0.66666667 0.75 0.75 0.70833333
0.75 0.79166667 0.625 0.79166667]
mean value: 0.7416666666666667
key: train_accuracy
value: [0.86574074 0.93981481 0.88425926 0.7962963 0.89814815 0.88888889
0.86111111 0.90277778 0.76851852 0.89351852]
mean value: 0.8699074074074074
key: test_fscore
value: [0.84615385 0.7 0.66666667 0.8 0.78571429 0.63157895
0.7 0.8 0.4 0.8 ]
mean value: 0.7130113745903219
key: train_fscore
value: [0.87659574 0.94063927 0.87046632 0.83076923 0.90178571 0.88679245
0.84042553 0.89855072 0.69879518 0.9004329 ]
mean value: 0.8645253070924268
key: test_precision
value: [0.78571429 0.875 0.66666667 0.66666667 0.6875 0.85714286
0.875 0.76923077 1. 0.76923077]
mean value: 0.7952152014652014
key: train_precision
value: [0.81102362 0.92792793 0.98823529 0.71052632 0.87068966 0.90384615
0.9875 0.93939394 1. 0.84552846]
mean value: 0.8984671363579353
key: test_recall
value: [0.91666667 0.58333333 0.66666667 1. 0.91666667 0.5
0.58333333 0.83333333 0.25 0.83333333]
mean value: 0.7083333333333334
key: train_recall
value: [0.9537037 0.9537037 0.77777778 1. 0.93518519 0.87037037
0.73148148 0.86111111 0.53703704 0.96296296]
mean value: 0.8583333333333334
key: test_roc_auc
value: [0.83333333 0.75 0.66666667 0.75 0.75 0.70833333
0.75 0.79166667 0.625 0.79166667]
mean value: 0.7416666666666667
key: train_roc_auc
value: [0.86574074 0.93981481 0.88425926 0.7962963 0.89814815 0.88888889
0.86111111 0.90277778 0.76851852 0.89351852]
mean value: 0.8699074074074074
key: test_jcc
value: [0.73333333 0.53846154 0.5 0.66666667 0.64705882 0.46153846
0.53846154 0.66666667 0.25 0.66666667]
mean value: 0.5668853695324283
key: train_jcc
value: [0.78030303 0.88793103 0.7706422 0.71052632 0.82113821 0.79661017
0.72477064 0.81578947 0.53703704 0.81889764]
mean value: 0.7663645754002122
MCC on Blind test: 0.42
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0160954 0.01449347 0.01418686 0.01509547 0.01532292 0.01373005
0.01380444 0.0151577 0.01477861 0.01720095]
mean value: 0.014986586570739747
key: score_time
value: [0.01187539 0.01171732 0.01166487 0.01170754 0.01174474 0.01170206
0.01171851 0.01169991 0.01186538 0.01181006]
mean value: 0.011750578880310059
key: test_mcc
value: [0.66666667 0.58536941 0.58536941 0.30151134 0.3380617 0.4472136
0.4472136 0.58536941 0.37796447 0.3380617 ]
mean value: 0.4672801300051277
key: train_mcc
value: [0.84553359 0.62733435 0.80125769 0.46940279 0.85243671 0.5487044
0.51225071 0.83333333 0.53452248 0.80235109]
mean value: 0.6827127149571861
key: test_accuracy
value: [0.83333333 0.79166667 0.79166667 0.58333333 0.66666667 0.66666667
0.66666667 0.79166667 0.625 0.66666667]
mean value: 0.7083333333333333
key: train_accuracy
value: [0.9212963 0.78240741 0.89814815 0.68055556 0.92592593 0.73611111
0.71296296 0.91666667 0.72222222 0.89351852]
mean value: 0.8189814814814815
key: test_fscore
value: [0.83333333 0.8 0.8 0.28571429 0.69230769 0.5
0.5 0.8 0.4 0.69230769]
mean value: 0.6303663003663004
key: train_fscore
value: [0.92444444 0.82129278 0.89215686 0.53061224 0.92727273 0.64596273
0.6025641 0.91666667 0.61538462 0.90295359]
mean value: 0.7779310759058158
key: test_precision
value: [0.83333333 0.76923077 0.76923077 1. 0.64285714 1.
1. 0.76923077 1. 0.64285714]
mean value: 0.8426739926739927
key: train_precision
value: [0.88888889 0.69677419 0.94791667 1. 0.91071429 0.98113208
0.97916667 0.91666667 1. 0.82945736]
mean value: 0.9150716807964345
key: test_recall
value: [0.83333333 0.83333333 0.83333333 0.16666667 0.75 0.33333333
0.33333333 0.83333333 0.25 0.75 ]
mean value: 0.5916666666666667
key: train_recall
value: [0.96296296 1. 0.84259259 0.36111111 0.94444444 0.48148148
0.43518519 0.91666667 0.44444444 0.99074074]
mean value: 0.7379629629629629
key: test_roc_auc
value: [0.83333333 0.79166667 0.79166667 0.58333333 0.66666667 0.66666667
0.66666667 0.79166667 0.625 0.66666667]
mean value: 0.7083333333333333
key: train_roc_auc
value: [0.9212963 0.78240741 0.89814815 0.68055556 0.92592593 0.73611111
0.71296296 0.91666667 0.72222222 0.89351852]
mean value: 0.8189814814814814
key: test_jcc
value: [0.71428571 0.66666667 0.66666667 0.16666667 0.52941176 0.33333333
0.33333333 0.66666667 0.25 0.52941176]
mean value: 0.48564425770308123
key: train_jcc
value: [0.85950413 0.69677419 0.80530973 0.36111111 0.86440678 0.47706422
0.43119266 0.84615385 0.44444444 0.82307692]
mean value: 0.6609038045474354
MCC on Blind test: 0.38
Accuracy on Blind test: 0.65
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.1361053 0.12321711 0.11989808 0.12128091 0.12607265 0.12655473
0.12733555 0.12603474 0.12223935 0.12803626]
mean value: 0.1256774663925171
key: score_time
value: [0.01632547 0.01506424 0.01524639 0.01524496 0.01628733 0.01632857
0.01587772 0.01606822 0.01655674 0.01658249]
mean value: 0.015958213806152345
key: test_mcc
value: [0.66666667 0.57735027 0.70710678 0.75261781 0.2508726 0.50709255
0.41812101 0.41812101 1. 0.41812101]
mean value: 0.5716069696899095
key: train_mcc
value: [1. 1. 1. 0.99078321 1. 1.
1. 1. 1. 1. ]
mean value: 0.999078321349667
key: test_accuracy
value: [0.83333333 0.75 0.83333333 0.875 0.625 0.75
0.70833333 0.70833333 1. 0.70833333]
mean value: 0.7791666666666667
key: train_accuracy
value: [1. 1. 1. 0.99537037 1. 1.
1. 1. 1. 1. ]
mean value: 0.999537037037037
key: test_fscore
value: [0.83333333 0.66666667 0.85714286 0.88 0.60869565 0.72727273
0.72 0.69565217 1. 0.72 ]
mean value: 0.770876341050254
key: train_fscore
value: [1. 1. 1. 0.99539171 1. 1.
1. 1. 1. 1. ]
mean value: 0.9995391705069124
key: test_precision
value: [0.83333333 1. 0.75 0.84615385 0.63636364 0.8
0.69230769 0.72727273 1. 0.69230769]
mean value: 0.7977738927738928
key: train_precision
value: [1. 1. 1. 0.99082569 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990825688073395
key: test_recall
value: [0.83333333 0.5 1. 0.91666667 0.58333333 0.66666667
0.75 0.66666667 1. 0.75 ]
mean value: 0.7666666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.75 0.83333333 0.875 0.625 0.75
0.70833333 0.70833333 1. 0.70833333]
mean value: 0.7791666666666667
key: train_roc_auc
value: [1. 1. 1. 0.99537037 1. 1.
1. 1. 1. 1. ]
mean value: 0.999537037037037
key: test_jcc
value: [0.71428571 0.5 0.75 0.78571429 0.4375 0.57142857
0.5625 0.53333333 1. 0.5625 ]
mean value: 0.6417261904761905
key: train_jcc
value: [1. 1. 1. 0.99082569 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990825688073395
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05251479 0.05213046 0.04939222 0.04378223 0.05401325 0.05549884
0.04612184 0.05807853 0.06026721 0.06066704]
mean value: 0.05324664115905762
key: score_time
value: [0.02272248 0.03480697 0.02512097 0.02819991 0.03404307 0.02167606
0.03094029 0.03069401 0.04057026 0.09532309]
mean value: 0.03640971183776855
key: test_mcc
value: [0.83333333 0.38490018 0.60246408 0.50709255 0.43033148 0.53033009
0.50709255 0.60246408 0.91986621 0.75261781]
mean value: 0.607049235943675
key: train_mcc
value: [0.98164982 0.9459053 0.9459053 0.95407186 0.99078321 0.97259753
0.96362411 0.98164982 0.95407186 0.98148148]
mean value: 0.9671740288105546
key: test_accuracy
value: [0.91666667 0.66666667 0.79166667 0.75 0.70833333 0.75
0.75 0.79166667 0.95833333 0.875 ]
mean value: 0.7958333333333333
key: train_accuracy
value: [0.99074074 0.97222222 0.97222222 0.97685185 0.99537037 0.98611111
0.98148148 0.99074074 0.97685185 0.99074074]
mean value: 0.9833333333333333
key: test_fscore
value: [0.91666667 0.55555556 0.81481481 0.72727273 0.66666667 0.7
0.72727273 0.76190476 0.95652174 0.86956522]
mean value: 0.769624087667566
key: train_fscore
value: [0.99065421 0.97142857 0.97142857 0.97652582 0.99534884 0.98591549
0.98113208 0.99065421 0.97652582 0.99074074]
mean value: 0.9830354343644072
key: test_precision
value: [0.91666667 0.83333333 0.73333333 0.8 0.77777778 0.875
0.8 0.88888889 1. 0.90909091]
mean value: 0.8534090909090909
key: train_precision
value: [1. 1. 1. 0.99047619 1. 1.
1. 1. 0.99047619 0.99074074]
mean value: 0.9971693121693121
key: test_recall
value: [0.91666667 0.41666667 0.91666667 0.66666667 0.58333333 0.58333333
0.66666667 0.66666667 0.91666667 0.83333333]
mean value: 0.7166666666666667
key: train_recall
value: [0.98148148 0.94444444 0.94444444 0.96296296 0.99074074 0.97222222
0.96296296 0.98148148 0.96296296 0.99074074]
mean value: 0.9694444444444444
key: test_roc_auc
value: [0.91666667 0.66666667 0.79166667 0.75 0.70833333 0.75
0.75 0.79166667 0.95833333 0.875 ]
mean value: 0.7958333333333334
key: train_roc_auc
value: [0.99074074 0.97222222 0.97222222 0.97685185 0.99537037 0.98611111
0.98148148 0.99074074 0.97685185 0.99074074]
mean value: 0.9833333333333333
key: test_jcc
value: [0.84615385 0.38461538 0.6875 0.57142857 0.5 0.53846154
0.57142857 0.61538462 0.91666667 0.76923077]
mean value: 0.6400869963369963
key: train_jcc
value: [0.98148148 0.94444444 0.94444444 0.95412844 0.99074074 0.97222222
0.96296296 0.98148148 0.95412844 0.98165138]
mean value: 0.9667686034658511
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.0322156 0.02812076 0.05939436 0.04106569 0.02669883 0.02819943
0.06717181 0.06473112 0.06225467 0.054461 ]
mean value: 0.046431326866149904
key: score_time
value: [0.01301074 0.01293302 0.02412224 0.01273417 0.01269698 0.01300836
0.0239048 0.01927137 0.02405953 0.01356697]
mean value: 0.016930818557739258
key: test_mcc
value: [0.75261781 0.3380617 0. 0.58536941 0.3380617 0.58536941
0.25819889 0.43033148 0.6761234 0.5 ]
mean value: 0.44641338032410316
key: train_mcc
value: [0.98164982 0.98164982 0.98164982 0.99078321 0.99078321 0.98164982
0.98164982 0.98164982 0.98164982 0.98164982]
mean value: 0.9834764964705683
key: test_accuracy
value: [0.875 0.66666667 0.5 0.79166667 0.66666667 0.79166667
0.625 0.70833333 0.83333333 0.75 ]
mean value: 0.7208333333333333
key: train_accuracy
value: [0.99074074 0.99074074 0.99074074 0.99537037 0.99537037 0.99074074
0.99074074 0.99074074 0.99074074 0.99074074]
mean value: 0.9916666666666666
key: test_fscore
value: [0.88 0.63636364 0.57142857 0.8 0.69230769 0.8
0.66666667 0.74074074 0.84615385 0.75 ]
mean value: 0.7383661153661154
key: train_fscore
value: [0.99082569 0.99082569 0.99082569 0.99539171 0.99539171 0.99082569
0.99082569 0.99082569 0.99082569 0.99082569]
mean value: 0.9917388914725405
key: test_precision
value: [0.84615385 0.7 0.5 0.76923077 0.64285714 0.76923077
0.6 0.66666667 0.78571429 0.75 ]
mean value: 0.702985347985348
key: train_precision
value: [0.98181818 0.98181818 0.98181818 0.99082569 0.99082569 0.98181818
0.98181818 0.98181818 0.98181818 0.98181818]
mean value: 0.9836196830692243
key: test_recall
value: [0.91666667 0.58333333 0.66666667 0.83333333 0.75 0.83333333
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.7833333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.66666667 0.5 0.79166667 0.66666667 0.79166667
0.625 0.70833333 0.83333333 0.75 ]
mean value: 0.7208333333333333
key: train_roc_auc
value: [0.99074074 0.99074074 0.99074074 0.99537037 0.99537037 0.99074074
0.99074074 0.99074074 0.99074074 0.99074074]
mean value: 0.9916666666666667
key: test_jcc
value: [0.78571429 0.46666667 0.4 0.66666667 0.52941176 0.66666667
0.5 0.58823529 0.73333333 0.6 ]
mean value: 0.5936694677871148
key: train_jcc
value: [0.98181818 0.98181818 0.98181818 0.99082569 0.99082569 0.98181818
0.98181818 0.98181818 0.98181818 0.98181818]
mean value: 0.9836196830692243
MCC on Blind test: 0.15
Accuracy on Blind test: 0.58
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.40719414 0.38963819 0.38802338 0.38530183 0.39083529 0.38973022
0.3902452 0.38925004 0.38730001 0.3895514 ]
mean value: 0.39070696830749513
key: score_time
value: [0.00941658 0.00929379 0.00939155 0.00930834 0.00956917 0.00928354
0.00918388 0.00929189 0.00934386 0.00926805]
mean value: 0.009335064888000488
key: test_mcc
value: [0.66666667 0.45834925 0.45834925 0.84515425 0.6761234 0.77459667
0.50709255 0.58536941 0.75261781 0.41812101]
mean value: 0.6142440265299691
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.70833333 0.70833333 0.91666667 0.83333333 0.875
0.75 0.79166667 0.875 0.70833333]
mean value: 0.8
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.63157895 0.75862069 0.92307692 0.84615385 0.85714286
0.72727273 0.8 0.88 0.69565217]
mean value: 0.7952831497916324
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.85714286 0.64705882 0.85714286 0.78571429 1.
0.8 0.76923077 0.84615385 0.72727273]
mean value: 0.8123049499520087
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.91666667 1. 0.91666667 0.75
0.66666667 0.83333333 0.91666667 0.66666667]
mean value: 0.8
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.70833333 0.70833333 0.91666667 0.83333333 0.875
0.75 0.79166667 0.875 0.70833333]
mean value: 0.8
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.46153846 0.61111111 0.85714286 0.73333333 0.75
0.57142857 0.66666667 0.78571429 0.53333333]
mean value: 0.6684554334554335
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.45
Accuracy on Blind test: 0.74
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02204514 0.02083945 0.02080059 0.02074742 0.03952312 0.02238369
0.06109738 0.03359938 0.02302122 0.02167463]
mean value: 0.028573203086853027
key: score_time
value: [0.0126133 0.01318598 0.01951098 0.01508093 0.0224371 0.01442599
0.01425624 0.02035761 0.01714063 0.01932621]
mean value: 0.01683349609375
key: test_mcc
value: [0.75261781 0.50709255 0. 0.43033148 0.2508726 0.50709255
0.25819889 0.33333333 0.3380617 0.16903085]
mean value: 0.35466317765122685
key: train_mcc
value: [1. 0.98164982 0.96362411 1. 0.88607221 1.
0.96362411 1. 0.90284331 0.97259753]
mean value: 0.9670411093291738
key: test_accuracy
value: [0.875 0.75 0.5 0.70833333 0.625 0.75
0.625 0.66666667 0.66666667 0.58333333]
mean value: 0.675
key: train_accuracy
value: [1. 0.99074074 0.98148148 1. 0.93981481 1.
0.98148148 1. 0.94907407 0.98611111]
mean value: 0.9828703703703704
key: test_fscore
value: [0.88 0.72727273 0.53846154 0.74074074 0.60869565 0.76923077
0.66666667 0.66666667 0.69230769 0.61538462]
mean value: 0.690542706890533
key: train_fscore
value: [1. 0.99082569 0.98181818 1. 0.94323144 1.
0.98181818 1. 0.95154185 0.98630137]
mean value: 0.9835536712841071
key: test_precision
value: [0.84615385 0.8 0.5 0.66666667 0.63636364 0.71428571
0.6 0.66666667 0.64285714 0.57142857]
mean value: 0.6644422244422244
key: train_precision
value: [1. 0.98181818 0.96428571 1. 0.89256198 1.
0.96428571 1. 0.90756303 0.97297297]
mean value: 0.9683487592043742
key: test_recall
value: [0.91666667 0.66666667 0.58333333 0.83333333 0.58333333 0.83333333
0.75 0.66666667 0.75 0.66666667]
mean value: 0.725
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.75 0.5 0.70833333 0.625 0.75
0.625 0.66666667 0.66666667 0.58333333]
mean value: 0.675
key: train_roc_auc
value: [1. 0.99074074 0.98148148 1. 0.93981481 1.
0.98148148 1. 0.94907407 0.98611111]
mean value: 0.9828703703703704
key: test_jcc
value: [0.78571429 0.57142857 0.36842105 0.58823529 0.4375 0.625
0.5 0.5 0.52941176 0.44444444]
mean value: 0.535015541304241
key: train_jcc
value: [1. 0.98181818 0.96428571 1. 0.89256198 1.
0.96428571 1. 0.90756303 0.97297297]
mean value: 0.9683487592043742
MCC on Blind test: 0.1
Accuracy on Blind test: 0.56
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01443124 0.01399136 0.01988292 0.03507066 0.03442311 0.03594494
0.01563334 0.0141995 0.01398373 0.01407266]
mean value: 0.021163344383239746
key: score_time
value: [0.01197124 0.01205635 0.02042127 0.02123404 0.01202655 0.01211452
0.01195836 0.01187086 0.0119369 0.01197267]
mean value: 0.013756275177001953
key: test_mcc
value: [0.75261781 0.60246408 0.45834925 0.60246408 0.41812101 0.58536941
0.33333333 0.5 0.91986621 0.58536941]
mean value: 0.5757954573028512
key: train_mcc
value: [0.85243671 0.88904134 0.90756304 0.87171665 0.88904134 0.88904134
0.89849486 0.87171665 0.86203543 0.90803041]
mean value: 0.8839117784516624
key: test_accuracy
value: [0.875 0.79166667 0.70833333 0.79166667 0.70833333 0.79166667
0.66666667 0.75 0.95833333 0.79166667]
mean value: 0.7833333333333333
key: train_accuracy
value: [0.92592593 0.94444444 0.9537037 0.93518519 0.94444444 0.94444444
0.94907407 0.93518519 0.93055556 0.9537037 ]
mean value: 0.9416666666666667
key: test_fscore
value: [0.86956522 0.76190476 0.75862069 0.81481481 0.72 0.7826087
0.66666667 0.75 0.96 0.7826087 ]
mean value: 0.7866789541737068
key: train_fscore
value: [0.92727273 0.94495413 0.95412844 0.93693694 0.94495413 0.94495413
0.94977169 0.93693694 0.9321267 0.95454545]
mean value: 0.9426581267710425
key: test_precision
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.90909091 0.88888889 0.64705882 0.73333333 0.69230769 0.81818182
0.66666667 0.75 0.92307692 0.81818182]
mean value: 0.7846786873257462
key: train_precision
value: [0.91071429 0.93636364 0.94545455 0.9122807 0.93636364 0.93636364
0.93693694 0.9122807 0.91150442 0.9375 ]
mean value: 0.927576250548421
key: test_recall
value: [0.83333333 0.66666667 0.91666667 0.91666667 0.75 0.75
0.66666667 0.75 1. 0.75 ]
mean value: 0.8
key: train_recall
value: [0.94444444 0.9537037 0.96296296 0.96296296 0.9537037 0.9537037
0.96296296 0.96296296 0.9537037 0.97222222]
mean value: 0.9583333333333333
key: test_roc_auc
value: [0.875 0.79166667 0.70833333 0.79166667 0.70833333 0.79166667
0.66666667 0.75 0.95833333 0.79166667]
mean value: 0.7833333333333333
key: train_roc_auc
value: [0.92592593 0.94444444 0.9537037 0.93518519 0.94444444 0.94444444
0.94907407 0.93518519 0.93055556 0.9537037 ]
mean value: 0.9416666666666667
key: test_jcc
value: [0.76923077 0.61538462 0.61111111 0.6875 0.5625 0.64285714
0.5 0.6 0.92307692 0.64285714]
mean value: 0.6554517704517705
key: train_jcc
value: [0.86440678 0.89565217 0.9122807 0.88135593 0.89565217 0.89565217
0.90434783 0.88135593 0.87288136 0.91304348]
mean value: 0.8916628527841343
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.28584599 0.23216796 0.22558236 0.22630048 0.30875087 0.23271036
0.22683549 0.2345891 0.24051881 0.27153778]
mean value: 0.24848392009735107
key: score_time
value: [0.01207423 0.02121472 0.02310538 0.0239768 0.0239315 0.0196178
0.02267861 0.02345586 0.02220201 0.02112508]
mean value: 0.02133820056915283
key: test_mcc
value: [0.75261781 0.58536941 0.64168895 0.58536941 0.60246408 0.60246408
0.5 0.58536941 0.75261781 0.66666667]
mean value: 0.6274627605767488
key: train_mcc
value: [0.74704394 0.78978412 0.77253603 0.75261781 0.77120096 0.77992042
0.78869542 0.77120096 0.77013788 0.75158034]
mean value: 0.7694717888512735
key: test_accuracy
value: [0.875 0.79166667 0.79166667 0.79166667 0.79166667 0.79166667
0.75 0.79166667 0.875 0.83333333]
mean value: 0.8083333333333333
key: train_accuracy
value: [0.87037037 0.89351852 0.88425926 0.875 0.88425926 0.88888889
0.89351852 0.88425926 0.88425926 0.875 ]
mean value: 0.8833333333333333
key: test_fscore
value: [0.88 0.7826087 0.82758621 0.8 0.81481481 0.76190476
0.75 0.8 0.88 0.83333333]
mean value: 0.8130247812601635
key: train_fscore
value: [0.87826087 0.89777778 0.88986784 0.88 0.88888889 0.89285714
0.89686099 0.88888889 0.88789238 0.87892377]
mean value: 0.8880218539432451
key: test_precision
value: [0.84615385 0.81818182 0.70588235 0.76923077 0.73333333 0.88888889
0.75 0.76923077 0.84615385 0.83333333]
mean value: 0.7960388957447782
key: train_precision
value: [0.82786885 0.86324786 0.8487395 0.84615385 0.85470085 0.86206897
0.86956522 0.85470085 0.86086957 0.85217391]
mean value: 0.854008942823017
key: test_recall
value: [0.91666667 0.75 1. 0.83333333 0.91666667 0.66666667
0.75 0.83333333 0.91666667 0.83333333]
mean value: 0.8416666666666667
key: train_recall
value: [0.93518519 0.93518519 0.93518519 0.91666667 0.92592593 0.92592593
0.92592593 0.92592593 0.91666667 0.90740741]
mean value: 0.925
key: test_roc_auc
value: [0.875 0.79166667 0.79166667 0.79166667 0.79166667 0.79166667
0.75 0.79166667 0.875 0.83333333]
mean value: 0.8083333333333333
key: train_roc_auc
value: [0.87037037 0.89351852 0.88425926 0.875 0.88425926 0.88888889
0.89351852 0.88425926 0.88425926 0.875 ]
mean value: 0.8833333333333333
key: test_jcc
value: [0.78571429 0.64285714 0.70588235 0.66666667 0.6875 0.61538462
0.6 0.66666667 0.78571429 0.71428571]
mean value: 0.6870671730230554
key: train_jcc
value: [0.78294574 0.81451613 0.8015873 0.78571429 0.8 0.80645161
0.81300813 0.8 0.7983871 0.784 ]
mean value: 0.7986610292526675
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0308404 0.02457547 0.03144908 0.03146958 0.04254293 0.05157399
0.04247165 0.04434466 0.05195832 0.03415108]
mean value: 0.03853771686553955
key: score_time
value: [0.01679897 0.01194048 0.012079 0.01197195 0.01233697 0.01218081
0.0123651 0.01207662 0.01558709 0.01403546]
mean value: 0.013137245178222656
key: test_mcc
value: [0.58536941 0.6761234 0.64168895 0.6761234 0.60246408 0.60246408
0.5 0.58536941 0.83333333 0.58536941]
mean value: 0.6288305461987017
key: train_mcc
value: [0.75261781 0.83390548 0.797528 0.77898084 0.82495863 0.77898084
0.81537425 0.79973188 0.77898084 0.797528 ]
mean value: 0.7958586569080919
key: test_accuracy
value: [0.79166667 0.83333333 0.79166667 0.83333333 0.79166667 0.79166667
0.75 0.79166667 0.91666667 0.79166667]
mean value: 0.8083333333333333
key: train_accuracy
value: [0.875 0.91666667 0.89814815 0.88888889 0.91203704 0.88888889
0.90740741 0.89814815 0.88888889 0.89814815]
mean value: 0.8972222222222223
key: test_fscore
value: [0.7826087 0.81818182 0.82758621 0.84615385 0.81481481 0.76190476
0.75 0.8 0.91666667 0.7826087 ]
mean value: 0.8100525505922808
key: train_fscore
value: [0.88 0.91818182 0.9009009 0.89189189 0.91402715 0.89189189
0.90909091 0.90265487 0.89189189 0.9009009 ]
mean value: 0.9001432221328108
key: test_precision
value: [0.81818182 0.9 0.70588235 0.78571429 0.73333333 0.88888889
0.75 0.76923077 0.91666667 0.81818182]
mean value: 0.8086079933138757
key: train_precision
value: [0.84615385 0.90178571 0.87719298 0.86842105 0.89380531 0.86842105
0.89285714 0.86440678 0.86842105 0.87719298]
mean value: 0.875865791549925
key: test_recall
value: [0.75 0.75 1. 0.91666667 0.91666667 0.66666667
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.825
key: train_recall
value: [0.91666667 0.93518519 0.92592593 0.91666667 0.93518519 0.91666667
0.92592593 0.94444444 0.91666667 0.92592593]
mean value: 0.9259259259259259
key: test_roc_auc
value: [0.79166667 0.83333333 0.79166667 0.83333333 0.79166667 0.79166667
0.75 0.79166667 0.91666667 0.79166667]
mean value: 0.8083333333333333
key: train_roc_auc
value: [0.875 0.91666667 0.89814815 0.88888889 0.91203704 0.88888889
0.90740741 0.89814815 0.88888889 0.89814815]
mean value: 0.8972222222222223
key: test_jcc
value: [0.64285714 0.69230769 0.70588235 0.73333333 0.6875 0.61538462
0.6 0.66666667 0.84615385 0.64285714]
mean value: 0.6832942792501616
key: train_jcc
value: [0.78571429 0.8487395 0.81967213 0.80487805 0.84166667 0.80487805
0.83333333 0.82258065 0.80487805 0.81967213]
mean value: 0.8186012835310441
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.71998596 0.75151277 0.88045883 0.72910571 0.72733355 0.90851116
0.74258113 0.76826286 0.8899796 0.75839233]
mean value: 0.7876123905181884
key: score_time
value: [0.01214218 0.01221347 0.01212549 0.01217508 0.01212287 0.01476812
0.01231885 0.01283383 0.01213813 0.01209283]
mean value: 0.012493085861206055
key: test_mcc
value: [0.75261781 0.60246408 0.64168895 0.70710678 0.64168895 0.50709255
0.5 0.58536941 0.6761234 0.66666667]
mean value: 0.6280818592400688
key: train_mcc
value: [0.74393663 0.88057382 0.77992042 0.70479219 0.75158034 0.94444444
0.78869542 0.77120096 0.75158034 0.74188651]
mean value: 0.78586110875256
key: test_accuracy
value: [0.875 0.79166667 0.79166667 0.83333333 0.79166667 0.75
0.75 0.79166667 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_accuracy
value: [0.87037037 0.93981481 0.88888889 0.85185185 0.875 0.97222222
0.89351852 0.88425926 0.875 0.87037037]
mean value: 0.8921296296296296
key: test_fscore
value: [0.88 0.76190476 0.82758621 0.85714286 0.82758621 0.72727273
0.75 0.8 0.84615385 0.83333333]
mean value: 0.8110979939600629
key: train_fscore
value: [0.87610619 0.94117647 0.89285714 0.85585586 0.87892377 0.97222222
0.89686099 0.88888889 0.87892377 0.87387387]
mean value: 0.8955689169155857
key: test_precision
value: [0.84615385 0.88888889 0.70588235 0.75 0.70588235 0.8
0.75 0.76923077 0.78571429 0.83333333]
mean value: 0.7835085829203476
key: train_precision
value: [0.83898305 0.92035398 0.86206897 0.83333333 0.85217391 0.97222222
0.86956522 0.85470085 0.85217391 0.85087719]
mean value: 0.8706452645382711
key: test_recall
value: [0.91666667 0.66666667 1. 1. 1. 0.66666667
0.75 0.83333333 0.91666667 0.83333333]
mean value: 0.8583333333333333
key: train_recall
value: [0.91666667 0.96296296 0.92592593 0.87962963 0.90740741 0.97222222
0.92592593 0.92592593 0.90740741 0.89814815]
mean value: 0.9222222222222223
key: test_roc_auc
value: [0.875 0.79166667 0.79166667 0.83333333 0.79166667 0.75
0.75 0.79166667 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_roc_auc
value: [0.87037037 0.93981481 0.88888889 0.85185185 0.875 0.97222222
0.89351852 0.88425926 0.875 0.87037037]
mean value: 0.8921296296296296
key: test_jcc
value: [0.78571429 0.61538462 0.70588235 0.75 0.70588235 0.57142857
0.6 0.66666667 0.73333333 0.71428571]
mean value: 0.684857789269554
key: train_jcc
value: [0.77952756 0.88888889 0.80645161 0.7480315 0.784 0.94594595
0.81300813 0.8 0.784 0.776 ]
mean value: 0.8125853632937472
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01518965 0.01046348 0.01070547 0.01042414 0.0102427 0.01024985
0.00982404 0.01023245 0.01031613 0.01034975]
mean value: 0.010799765586853027
key: score_time
value: [0.01212573 0.01018858 0.01014185 0.00970149 0.00969028 0.00975275
0.00983071 0.00985217 0.0098865 0.00986671]
mean value: 0.010103678703308106
key: test_mcc
value: [ 0.38490018 0.43033148 -0.2236068 0.35355339 0.64168895 0.50709255
0.27500955 0.70710678 0.60246408 0.60246408]
mean value: 0.42810042384202684
key: train_mcc
value: [0.49840764 0.50566876 0.54289671 0.51282259 0.48653363 0.52261966
0.50639215 0.47847133 0.49041703 0.4990914 ]
mean value: 0.5043320918458405
key: test_accuracy
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.75
0.625 0.83333333 0.79166667 0.79166667]
mean value: 0.7041666666666666
key: train_accuracy
value: [0.73148148 0.71759259 0.75925926 0.75 0.72685185 0.74074074
0.73611111 0.72222222 0.72685185 0.72685185]
mean value: 0.7337962962962963
key: test_fscore
value: [0.73333333 0.74074074 0.5625 0.71428571 0.82758621 0.76923077
0.68965517 0.85714286 0.81481481 0.81481481]
mean value: 0.7524104423673389
key: train_fscore
value: [0.7734375 0.77490775 0.79032258 0.775 0.76862745 0.78294574
0.77647059 0.765625 0.77042802 0.77394636]
mean value: 0.7751710981089905
key: test_precision
value: [0.61111111 0.66666667 0.45 0.625 0.70588235 0.71428571
0.58823529 0.75 0.73333333 0.73333333]
mean value: 0.6577847805788982
key: train_precision
value: [0.66891892 0.64417178 0.7 0.70454545 0.66666667 0.67333333
0.67346939 0.66216216 0.66442953 0.66013072]
mean value: 0.6717827951678332
key: test_recall
value: [0.91666667 0.83333333 0.75 0.83333333 1. 0.83333333
0.83333333 1. 0.91666667 0.91666667]
mean value: 0.8833333333333333
key: train_recall
value: [0.91666667 0.97222222 0.90740741 0.86111111 0.90740741 0.93518519
0.91666667 0.90740741 0.91666667 0.93518519]
mean value: 0.9175925925925926
key: test_roc_auc
value: [0.66666667 0.70833333 0.41666667 0.66666667 0.79166667 0.75
0.625 0.83333333 0.79166667 0.79166667]
mean value: 0.7041666666666667
key: train_roc_auc
value: [0.73148148 0.71759259 0.75925926 0.75 0.72685185 0.74074074
0.73611111 0.72222222 0.72685185 0.72685185]
mean value: 0.7337962962962963
key: test_jcc
value: [0.57894737 0.58823529 0.39130435 0.55555556 0.70588235 0.625
0.52631579 0.75 0.6875 0.6875 ]
mean value: 0.6096240708335203
key: train_jcc
value: [0.63057325 0.63253012 0.65333333 0.63265306 0.62420382 0.6433121
0.63461538 0.62025316 0.62658228 0.63125 ]
mean value: 0.6329306514667632
MCC on Blind test: 0.19
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0104332 0.01062703 0.00978208 0.01074839 0.01041389 0.01024389
0.01062751 0.01055861 0.01067042 0.01015282]
mean value: 0.010425782203674317
key: score_time
value: [0.00950599 0.0098362 0.01021171 0.00980735 0.00950599 0.00997663
0.00994396 0.00983953 0.01008701 0.00977087]
mean value: 0.009848523139953613
key: test_mcc
value: [0.50709255 0.35355339 0.0860663 0.60246408 0.2508726 0.58536941
0.25819889 0.58536941 0.84515425 0.83333333]
mean value: 0.49074742109105457
key: train_mcc
value: [0.63060354 0.62103628 0.65743559 0.64993368 0.66222239 0.62060985
0.63006192 0.63856099 0.62253572 0.60625994]
mean value: 0.6339259890467467
key: test_accuracy
value: [0.75 0.66666667 0.54166667 0.79166667 0.625 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.7416666666666667
key: train_accuracy
value: [0.81481481 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.81481481 0.81481481 0.81018519 0.80092593]
mean value: 0.8157407407407408
key: test_fscore
value: [0.72727273 0.6 0.59259259 0.81481481 0.60869565 0.8
0.66666667 0.7826087 0.92307692 0.91666667]
mean value: 0.7432394738916478
key: train_fscore
value: [0.81981982 0.81447964 0.82949309 0.83035714 0.83842795 0.81278539
0.81818182 0.82905983 0.81777778 0.81222707]
mean value: 0.8222609523224956
key: test_precision
value: [0.8 0.75 0.53333333 0.73333333 0.63636364 0.76923077
0.6 0.81818182 0.85714286 0.91666667]
mean value: 0.7414252414252415
key: train_precision
value: [0.79824561 0.79646018 0.82568807 0.80172414 0.79338843 0.8018018
0.80357143 0.76984127 0.78632479 0.76859504]
mean value: 0.7945640759965436
key: test_recall
value: [0.66666667 0.5 0.66666667 0.91666667 0.58333333 0.83333333
0.75 0.75 1. 0.91666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.84259259 0.83333333 0.83333333 0.86111111 0.88888889 0.82407407
0.83333333 0.89814815 0.85185185 0.86111111]
mean value: 0.8527777777777777
key: test_roc_auc
value: [0.75 0.66666667 0.54166667 0.79166667 0.625 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.7416666666666667
key: train_roc_auc
value: [0.81481481 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.81481481 0.81481481 0.81018519 0.80092593]
mean value: 0.8157407407407408
key: test_jcc
value: [0.57142857 0.42857143 0.42105263 0.6875 0.4375 0.66666667
0.5 0.64285714 0.85714286 0.84615385]
mean value: 0.605887314439946
key: train_jcc
value: [0.69465649 0.6870229 0.70866142 0.70992366 0.72180451 0.68461538
0.69230769 0.7080292 0.69172932 0.68382353]
mean value: 0.6982574108759549
MCC on Blind test: 0.23
Accuracy on Blind test: 0.63
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00942445 0.01014233 0.00951648 0.01010513 0.00946379 0.01008248
0.01012278 0.01024103 0.0107336 0.01002169]
mean value: 0.00998537540435791
key: score_time
value: [0.01131845 0.011024 0.01125288 0.01072717 0.01059318 0.01117182
0.01103806 0.01099634 0.01130033 0.011235 ]
mean value: 0.01106572151184082
key: test_mcc
value: [0.58536941 0.58536941 0. 0.33333333 0.43033148 0.83333333
0.0860663 0.3380617 0.6761234 0.50709255]
mean value: 0.4375080918682246
key: train_mcc
value: [0.62361342 0.60774211 0.64356824 0.64993368 0.58760578 0.58760578
0.64514162 0.59763515 0.63355259 0.61491869]
mean value: 0.6191317076001605
key: test_accuracy
value: [0.79166667 0.79166667 0.5 0.66666667 0.70833333 0.91666667
0.54166667 0.66666667 0.83333333 0.75 ]
mean value: 0.7166666666666667
key: train_accuracy
value: [0.81018519 0.80092593 0.81944444 0.82407407 0.79166667 0.79166667
0.81944444 0.7962963 0.81481481 0.80555556]
mean value: 0.8074074074074074
key: test_fscore
value: [0.7826087 0.8 0.57142857 0.66666667 0.74074074 0.91666667
0.59259259 0.69230769 0.84615385 0.72727273]
mean value: 0.7336438199481677
key: train_fscore
value: [0.81938326 0.81385281 0.82969432 0.83035714 0.80349345 0.80349345
0.83116883 0.80869565 0.8245614 0.81578947]
mean value: 0.8180489799865002
key: test_precision
value: [0.81818182 0.76923077 0.5 0.66666667 0.66666667 0.91666667
0.53333333 0.64285714 0.78571429 0.8 ]
mean value: 0.7099317349317349
key: train_precision
value: [0.78151261 0.76422764 0.78512397 0.80172414 0.76033058 0.76033058
0.7804878 0.76229508 0.78333333 0.775 ]
mean value: 0.7754365729395012
key: test_recall
value: [0.75 0.83333333 0.66666667 0.66666667 0.83333333 0.91666667
0.66666667 0.75 0.91666667 0.66666667]
mean value: 0.7666666666666666
key: train_recall
value: [0.86111111 0.87037037 0.87962963 0.86111111 0.85185185 0.85185185
0.88888889 0.86111111 0.87037037 0.86111111]
mean value: 0.8657407407407407
key: test_roc_auc
value: [0.79166667 0.79166667 0.5 0.66666667 0.70833333 0.91666667
0.54166667 0.66666667 0.83333333 0.75 ]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.81018519 0.80092593 0.81944444 0.82407407 0.79166667 0.79166667
0.81944444 0.7962963 0.81481481 0.80555556]
mean value: 0.8074074074074074
key: test_jcc
value: [0.64285714 0.66666667 0.4 0.5 0.58823529 0.84615385
0.42105263 0.52941176 0.73333333 0.57142857]
mean value: 0.5899139250842037
key: train_jcc
value: [0.69402985 0.68613139 0.70895522 0.70992366 0.67153285 0.67153285
0.71111111 0.67883212 0.70149254 0.68888889]
mean value: 0.6922430473142728
MCC on Blind test: 0.08
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01471233 0.01398039 0.0138483 0.01294971 0.01366282 0.01385617
0.01302624 0.01307869 0.01325321 0.01405549]
mean value: 0.013642334938049316
key: score_time
value: [0.01138425 0.01074481 0.01006532 0.01083374 0.01002431 0.01079631
0.01036167 0.01062918 0.01029849 0.01043081]
mean value: 0.010556888580322266
key: test_mcc
value: [0.75261781 0.66666667 0.35355339 0.50709255 0.43033148 0.58536941
0.50709255 0.6761234 0.58536941 0.75261781]
mean value: 0.5816834481651599
key: train_mcc
value: [0.77603992 0.77822 0.7741473 0.77120096 0.80976668 0.78439613
0.79280145 0.75549315 0.78262379 0.76572003]
mean value: 0.7790409438143523
key: test_accuracy
value: [0.875 0.83333333 0.66666667 0.75 0.70833333 0.79166667
0.75 0.83333333 0.79166667 0.875 ]
mean value: 0.7875
key: train_accuracy
value: [0.88425926 0.88425926 0.88425926 0.88425926 0.90277778 0.88888889
0.89351852 0.875 0.88888889 0.87962963]
mean value: 0.8865740740740741
key: test_fscore
value: [0.88 0.83333333 0.71428571 0.76923077 0.74074074 0.7826087
0.76923077 0.84615385 0.8 0.88 ]
mean value: 0.8015583868627346
key: train_fscore
value: [0.89177489 0.89270386 0.89082969 0.88888889 0.90748899 0.89565217
0.89956332 0.88209607 0.89473684 0.88695652]
mean value: 0.8930691250835735
key: test_precision
value: [0.84615385 0.83333333 0.625 0.71428571 0.66666667 0.81818182
0.71428571 0.78571429 0.76923077 0.84615385]
mean value: 0.7619005994005994
key: train_precision
value: [0.83739837 0.832 0.84297521 0.85470085 0.86554622 0.8442623
0.85123967 0.83471074 0.85 0.83606557]
mean value: 0.8448898935859159
key: test_recall
value: [0.91666667 0.83333333 0.83333333 0.83333333 0.83333333 0.75
0.83333333 0.91666667 0.83333333 0.91666667]
mean value: 0.85
key: train_recall
value: [0.9537037 0.96296296 0.94444444 0.92592593 0.9537037 0.9537037
0.9537037 0.93518519 0.94444444 0.94444444]
mean value: 0.9472222222222222
key: test_roc_auc
value: [0.875 0.83333333 0.66666667 0.75 0.70833333 0.79166667
0.75 0.83333333 0.79166667 0.875 ]
mean value: 0.7875000000000001
key: train_roc_auc
value: [0.88425926 0.88425926 0.88425926 0.88425926 0.90277778 0.88888889
0.89351852 0.875 0.88888889 0.87962963]
mean value: 0.8865740740740741
key: test_jcc
value: [0.78571429 0.71428571 0.55555556 0.625 0.58823529 0.64285714
0.625 0.73333333 0.66666667 0.78571429]
mean value: 0.6722362278244631
key: train_jcc
value: [0.8046875 0.80620155 0.80314961 0.8 0.83064516 0.81102362
0.81746032 0.7890625 0.80952381 0.796875 ]
mean value: 0.8068629067008504
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.62048364 0.84822226 0.81147861 1.21917677 1.00702286 0.98516321
1.01889133 1.1750536 1.02101469 0.70334888]
mean value: 0.9409855842590332
key: score_time
value: [0.0172441 0.01233554 0.01252532 0.0141468 0.01485038 0.01244426
0.0170753 0.01252484 0.0150919 0.0125711 ]
mean value: 0.014080953598022462
key: test_mcc
value: [0.58536941 0.45834925 0.45834925 0.75261781 0.58536941 0.60246408
0.5 0.58536941 0.91986621 0.50709255]
mean value: 0.5954847366971103
key: train_mcc
value: [0.87051965 0.95374459 0.92608473 0.9459053 0.98164982 0.92592593
0.94460643 0.9722639 0.96296296 0.88888889]
mean value: 0.9372552198040212
key: test_accuracy
value: [0.79166667 0.70833333 0.70833333 0.875 0.79166667 0.79166667
0.75 0.79166667 0.95833333 0.75 ]
mean value: 0.7916666666666666
key: train_accuracy
value: [0.93518519 0.97685185 0.96296296 0.97222222 0.99074074 0.96296296
0.97222222 0.98611111 0.98148148 0.94444444]
mean value: 0.9685185185185186
key: test_fscore
value: [0.7826087 0.63157895 0.75862069 0.86956522 0.7826087 0.76190476
0.75 0.8 0.95652174 0.72727273]
mean value: 0.7820681474027169
key: train_fscore
value: [0.93457944 0.97695853 0.96330275 0.97142857 0.99065421 0.96296296
0.97196262 0.98604651 0.98148148 0.94444444]
mean value: 0.968382151126681
key: test_precision
value: [0.81818182 0.85714286 0.64705882 0.90909091 0.81818182 0.88888889
0.75 0.76923077 1. 0.8 ]
mean value: 0.8257775884246472
key: train_precision
value: [0.94339623 0.97247706 0.95454545 1. 1. 0.96296296
0.98113208 0.99065421 0.98148148 0.94444444]
mean value: 0.9731093915148796
key: test_recall
value: [0.75 0.5 0.91666667 0.83333333 0.75 0.66666667
0.75 0.83333333 0.91666667 0.66666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.92592593 0.98148148 0.97222222 0.94444444 0.98148148 0.96296296
0.96296296 0.98148148 0.98148148 0.94444444]
mean value: 0.9638888888888889
key: test_roc_auc
value: [0.79166667 0.70833333 0.70833333 0.875 0.79166667 0.79166667
0.75 0.79166667 0.95833333 0.75 ]
mean value: 0.7916666666666666
key: train_roc_auc
value: [0.93518519 0.97685185 0.96296296 0.97222222 0.99074074 0.96296296
0.97222222 0.98611111 0.98148148 0.94444444]
mean value: 0.9685185185185186
key: test_jcc
value: [0.64285714 0.46153846 0.61111111 0.76923077 0.64285714 0.61538462
0.6 0.66666667 0.91666667 0.57142857]
mean value: 0.6497741147741147
key: train_jcc
value: [0.87719298 0.95495495 0.92920354 0.94444444 0.98148148 0.92857143
0.94545455 0.97247706 0.96363636 0.89473684]
mean value: 0.9392153647147814
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02163959 0.01607132 0.0166142 0.01710081 0.01704025 0.01535535
0.01831055 0.01811218 0.01655841 0.01498246]
mean value: 0.01717851161956787
key: score_time
value: [0.01209736 0.0092988 0.00942469 0.01364398 0.00921535 0.00895262
0.00913954 0.00918198 0.00914049 0.00899124]
mean value: 0.009908604621887206
key: test_mcc
value: [0.66666667 0.6761234 0.1767767 0.16903085 0.1767767 0.41812101
0.41812101 0.43033148 0.50709255 0.6761234 ]
mean value: 0.43151637615274063
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.83333333 0.58333333 0.58333333 0.58333333 0.70833333
0.70833333 0.70833333 0.75 0.83333333]
mean value: 0.7125
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.81818182 0.64285714 0.54545455 0.64285714 0.69565217
0.72 0.74074074 0.76923077 0.81818182]
mean value: 0.7226489484750355
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.9 0.5625 0.6 0.5625 0.72727273
0.69230769 0.66666667 0.71428571 0.9 ]
mean value: 0.7158866133866134
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.75 0.75 0.5 0.75 0.66666667
0.75 0.83333333 0.83333333 0.75 ]
mean value: 0.7416666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.83333333 0.58333333 0.58333333 0.58333333 0.70833333
0.70833333 0.70833333 0.75 0.83333333]
mean value: 0.7125
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.69230769 0.47368421 0.375 0.47368421 0.53333333
0.5625 0.58823529 0.625 0.69230769]
mean value: 0.5730338147404711
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.66
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09890485 0.10148096 0.10069036 0.10080361 0.1014545 0.10283852
0.10187411 0.10599065 0.10443878 0.1030066 ]
mean value: 0.10214829444885254
key: score_time
value: [0.01787829 0.01847506 0.01830387 0.01904178 0.01925802 0.01807094
0.01921487 0.01858139 0.01915097 0.01860642]
mean value: 0.018658161163330078
key: test_mcc
value: [0.5 0.58536941 0.43033148 0.58536941 0.41812101 0.6761234
0.41812101 0.50709255 0.84515425 0.41812101]
mean value: 0.5383803523280938
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.79166667 0.70833333 0.79166667 0.70833333 0.83333333
0.70833333 0.75 0.91666667 0.70833333]
mean value: 0.7666666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.7826087 0.74074074 0.8 0.72 0.81818182
0.72 0.76923077 0.92307692 0.72 ]
mean value: 0.7743838946882424
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.81818182 0.66666667 0.76923077 0.69230769 0.9
0.69230769 0.71428571 0.85714286 0.69230769]
mean value: 0.7552430902430902
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.75 0.83333333 0.83333333 0.75 0.75
0.75 0.83333333 1. 0.75 ]
mean value: 0.8
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.79166667 0.70833333 0.79166667 0.70833333 0.83333333
0.70833333 0.75 0.91666667 0.70833333]
mean value: 0.7666666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.64285714 0.58823529 0.66666667 0.5625 0.69230769
0.5625 0.625 0.85714286 0.5625 ]
mean value: 0.6359709653092006
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.63
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01117754 0.00970054 0.00896549 0.00912547 0.00895023 0.00913262
0.00898814 0.00906181 0.0092175 0.00906587]
mean value: 0.00933852195739746
key: score_time
value: [0.00911736 0.00853801 0.0085423 0.00858045 0.00854707 0.00860143
0.00853682 0.00857329 0.00868487 0.00865126]
mean value: 0.008637285232543946
key: test_mcc
value: [ 0.58536941 -0.0836242 0.3380617 0.43033148 0.33333333 0.16666667
-0.0836242 0.2508726 0.70710678 0.25819889]
mean value: 0.2902692463742723
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.45833333 0.66666667 0.70833333 0.66666667 0.58333333
0.45833333 0.625 0.83333333 0.625 ]
mean value: 0.6416666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.43478261 0.63636364 0.66666667 0.66666667 0.58333333
0.43478261 0.60869565 0.8 0.57142857]
mean value: 0.6185328439676265
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.45454545 0.7 0.77777778 0.66666667 0.58333333
0.45454545 0.63636364 1. 0.66666667]
mean value: 0.6758080808080809
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.41666667 0.58333333 0.58333333 0.66666667 0.58333333
0.41666667 0.58333333 0.66666667 0.5 ]
mean value: 0.575
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.45833333 0.66666667 0.70833333 0.66666667 0.58333333
0.45833333 0.625 0.83333333 0.625 ]
mean value: 0.6416666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.27777778 0.46666667 0.5 0.5 0.41176471
0.27777778 0.4375 0.66666667 0.4 ]
mean value: 0.4581010737628385
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.56
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.30116844 1.30090857 1.57670236 1.31556249 1.32972836 1.33339977
1.34411907 1.32579279 1.33161736 1.35981679]
mean value: 1.3518815994262696
key: score_time
value: [0.08967829 0.08942246 0.09675574 0.09040046 0.09746838 0.09508467
0.10231495 0.09039664 0.09650874 0.09847116]
mean value: 0.09465014934539795
key: test_mcc
value: [0.75261781 0.57735027 0.60246408 0.66666667 0.6761234 0.64168895
0.33333333 0.43033148 0.75261781 0.41812101]
mean value: 0.5851314802897141
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.75 0.79166667 0.83333333 0.83333333 0.79166667
0.66666667 0.70833333 0.875 0.70833333]
mean value: 0.7833333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.66666667 0.81481481 0.83333333 0.84615385 0.73684211
0.66666667 0.74074074 0.88 0.69565217]
mean value: 0.7750435564943574
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 1. 0.73333333 0.83333333 0.78571429 1.
0.66666667 0.66666667 0.84615385 0.72727273]
mean value: 0.8168231768231768
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.91666667 0.83333333 0.91666667 0.58333333
0.66666667 0.83333333 0.91666667 0.66666667]
mean value: 0.7666666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.75 0.79166667 0.83333333 0.83333333 0.79166667
0.66666667 0.70833333 0.875 0.70833333]
mean value: 0.7833333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.5 0.6875 0.71428571 0.73333333 0.58333333
0.5 0.58823529 0.78571429 0.53333333]
mean value: 0.6394966063348416
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.8898294 0.87875557 0.97417212 0.86813331 0.89023066 0.9033277
0.90997934 0.94354987 0.91526437 0.98898816]
mean value: 0.9162230491638184
key: score_time
value: [0.19945073 0.18877649 0.24253082 0.20921898 0.23907328 0.16415858
0.2325561 0.2260251 0.23510647 0.21560669]
mean value: 0.2152503252029419
key: test_mcc
value: [0.6761234 0.53033009 0.70710678 0.66666667 0.6761234 0.64168895
0.50709255 0.50709255 0.75261781 0.5 ]
mean value: 0.6164842203909101
key: train_mcc
value: [0.90756304 0.88949918 0.88904134 0.90756304 0.93554619 0.88904134
0.90756304 0.92608473 0.90803041 0.92656165]
mean value: 0.9086493966270899
key: test_accuracy
value: [0.83333333 0.75 0.83333333 0.83333333 0.83333333 0.79166667
0.75 0.75 0.875 0.75 ]
mean value: 0.8
key: train_accuracy
value: [0.9537037 0.94444444 0.94444444 0.9537037 0.96759259 0.94444444
0.9537037 0.96296296 0.9537037 0.96296296]
mean value: 0.9541666666666666
key: test_fscore
value: [0.81818182 0.7 0.85714286 0.83333333 0.84615385 0.73684211
0.72727273 0.76923077 0.86956522 0.75 ]
mean value: 0.7907722673969814
key: train_fscore
value: [0.95412844 0.94545455 0.94495413 0.95412844 0.96803653 0.94495413
0.95412844 0.96330275 0.95454545 0.96363636]
mean value: 0.9547269223591959
key: test_precision
value: [0.9 0.875 0.75 0.83333333 0.78571429 1.
0.8 0.71428571 0.90909091 0.75 ]
mean value: 0.8317424242424243
key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.94545455 0.92857143 0.93636364 0.94545455 0.95495495 0.93636364
0.94545455 0.95454545 0.9375 0.94642857]
mean value: 0.9431091318591318
key: test_recall
value: [0.75 0.58333333 1. 0.83333333 0.91666667 0.58333333
0.66666667 0.83333333 0.83333333 0.75 ]
mean value: 0.775
key: train_recall
value: [0.96296296 0.96296296 0.9537037 0.96296296 0.98148148 0.9537037
0.96296296 0.97222222 0.97222222 0.98148148]
mean value: 0.9666666666666667
key: test_roc_auc
value: [0.83333333 0.75 0.83333333 0.83333333 0.83333333 0.79166667
0.75 0.75 0.875 0.75 ]
mean value: 0.8
key: train_roc_auc
value: [0.9537037 0.94444444 0.94444444 0.9537037 0.96759259 0.94444444
0.9537037 0.96296296 0.9537037 0.96296296]
mean value: 0.9541666666666666
key: test_jcc
value: [0.69230769 0.53846154 0.75 0.71428571 0.73333333 0.58333333
0.57142857 0.625 0.76923077 0.6 ]
mean value: 0.6577380952380952
key: train_jcc
value: [0.9122807 0.89655172 0.89565217 0.9122807 0.9380531 0.89565217
0.9122807 0.92920354 0.91304348 0.92982456]
mean value: 0.9134822854059695
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02481365 0.00924039 0.00934982 0.009166 0.010216 0.01012945
0.01021767 0.01023412 0.00972962 0.00938153]
mean value: 0.011247825622558594
key: score_time
value: [0.01276445 0.0087719 0.00875235 0.00882077 0.00955319 0.00952816
0.00946164 0.00956464 0.00878978 0.00889206]
mean value: 0.009489893913269043
key: test_mcc
value: [0.50709255 0.35355339 0.0860663 0.60246408 0.2508726 0.58536941
0.25819889 0.58536941 0.84515425 0.83333333]
mean value: 0.49074742109105457
key: train_mcc
value: [0.63060354 0.62103628 0.65743559 0.64993368 0.66222239 0.62060985
0.63006192 0.63856099 0.62253572 0.60625994]
mean value: 0.6339259890467467
key: test_accuracy
value: [0.75 0.66666667 0.54166667 0.79166667 0.625 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.7416666666666667
key: train_accuracy
value: [0.81481481 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.81481481 0.81481481 0.81018519 0.80092593]
mean value: 0.8157407407407408
key: test_fscore
value: [0.72727273 0.6 0.59259259 0.81481481 0.60869565 0.8
0.66666667 0.7826087 0.92307692 0.91666667]
mean value: 0.7432394738916478
key: train_fscore
value: [0.81981982 0.81447964 0.82949309 0.83035714 0.83842795 0.81278539
0.81818182 0.82905983 0.81777778 0.81222707]
mean value: 0.8222609523224956
key: test_precision
value: [0.8 0.75 0.53333333 0.73333333 0.63636364 0.76923077
0.6 0.81818182 0.85714286 0.91666667]
mean value: 0.7414252414252415
key: train_precision
value: [0.79824561 0.79646018 0.82568807 0.80172414 0.79338843 0.8018018
0.80357143 0.76984127 0.78632479 0.76859504]
mean value: 0.7945640759965436
key: test_recall
value: [0.66666667 0.5 0.66666667 0.91666667 0.58333333 0.83333333
0.75 0.75 1. 0.91666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.84259259 0.83333333 0.83333333 0.86111111 0.88888889 0.82407407
0.83333333 0.89814815 0.85185185 0.86111111]
mean value: 0.8527777777777777
key: test_roc_auc
value: [0.75 0.66666667 0.54166667 0.79166667 0.625 0.79166667
0.625 0.79166667 0.91666667 0.91666667]
mean value: 0.7416666666666667
key: train_roc_auc
value: [0.81481481 0.81018519 0.8287037 0.82407407 0.8287037 0.81018519
0.81481481 0.81481481 0.81018519 0.80092593]
mean value: 0.8157407407407408
key: test_jcc
value: [0.57142857 0.42857143 0.42105263 0.6875 0.4375 0.66666667
0.5 0.64285714 0.85714286 0.84615385]
mean value: 0.605887314439946
key: train_jcc
value: [0.69465649 0.6870229 0.70866142 0.70992366 0.72180451 0.68461538
0.69230769 0.7080292 0.69172932 0.68382353]
mean value: 0.6982574108759549
MCC on Blind test: 0.23
Accuracy on Blind test: 0.63
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.13619304 0.05510259 0.05758572 0.06099463 0.0575664 0.0591476
0.06102586 0.07658863 0.06151056 0.06288791]
mean value: 0.06886029243469238
key: score_time
value: [0.01118422 0.01046395 0.01036596 0.0104425 0.01049066 0.01043773
0.01057792 0.01159239 0.01076841 0.01087976]
mean value: 0.010720348358154297
key: test_mcc
value: [0.83333333 0.6761234 0.70710678 0.75261781 0.75261781 0.53033009
0.66666667 0.5 0.75261781 0.50709255]
mean value: 0.6678506250715527
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.83333333 0.83333333 0.875 0.875 0.75
0.83333333 0.75 0.875 0.75 ]
mean value: 0.8291666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91666667 0.81818182 0.85714286 0.88 0.86956522 0.7
0.83333333 0.75 0.88 0.72727273]
mean value: 0.8232162619988707
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91666667 0.9 0.75 0.84615385 0.90909091 0.875
0.83333333 0.75 0.84615385 0.8 ]
mean value: 0.8426398601398601
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91666667 0.75 1. 0.91666667 0.83333333 0.58333333
0.83333333 0.75 0.91666667 0.66666667]
mean value: 0.8166666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.83333333 0.83333333 0.875 0.875 0.75
0.83333333 0.75 0.875 0.75 ]
mean value: 0.8291666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84615385 0.69230769 0.75 0.78571429 0.76923077 0.53846154
0.71428571 0.6 0.78571429 0.57142857]
mean value: 0.7053296703296703
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03201962 0.06125259 0.06060815 0.06064439 0.05987048 0.06112409
0.06051135 0.07771921 0.05222392 0.07174587]
mean value: 0.0597719669342041
key: score_time
value: [0.02101016 0.0216496 0.02375889 0.02242851 0.02114177 0.02297425
0.02142954 0.02500796 0.02404189 0.02397108]
mean value: 0.022741365432739257
key: test_mcc
value: [0.66666667 0.38490018 0.16903085 0.5 0.16666667 0.58536941
0.60246408 0.50709255 0.75261781 0.2508726 ]
mean value: 0.4585680811666079
key: train_mcc
value: [0.92608473 0.96312812 0.94444444 0.95374459 0.94460643 0.94444444
0.95374459 0.96312812 0.95374459 0.97259753]
mean value: 0.9519667586332593
key: test_accuracy
value: [0.83333333 0.66666667 0.58333333 0.75 0.58333333 0.79166667
0.79166667 0.75 0.875 0.625 ]
mean value: 0.725
key: train_accuracy
value: [0.96296296 0.98148148 0.97222222 0.97685185 0.97222222 0.97222222
0.97685185 0.98148148 0.97685185 0.98611111]
mean value: 0.975925925925926
key: test_fscore
value: [0.83333333 0.73333333 0.61538462 0.75 0.58333333 0.8
0.81481481 0.72727273 0.86956522 0.64 ]
mean value: 0.7367037374863462
key: train_fscore
value: [0.96330275 0.98165138 0.97222222 0.97695853 0.97247706 0.97222222
0.97674419 0.98165138 0.97695853 0.98630137]
mean value: 0.9760489619852554
key: test_precision
value: [0.83333333 0.61111111 0.57142857 0.75 0.58333333 0.76923077
0.73333333 0.8 0.90909091 0.61538462]
mean value: 0.7176245976245976
key: train_precision
value: [0.95454545 0.97272727 0.97222222 0.97247706 0.96363636 0.97222222
0.98130841 0.97272727 0.97247706 0.97297297]
mean value: 0.9707316320709102
key: test_recall
value: [0.83333333 0.91666667 0.66666667 0.75 0.58333333 0.83333333
0.91666667 0.66666667 0.83333333 0.66666667]
mean value: 0.7666666666666666
key: train_recall
value: [0.97222222 0.99074074 0.97222222 0.98148148 0.98148148 0.97222222
0.97222222 0.99074074 0.98148148 1. ]
mean value: 0.9814814814814815
key: test_roc_auc
value: [0.83333333 0.66666667 0.58333333 0.75 0.58333333 0.79166667
0.79166667 0.75 0.875 0.625 ]
mean value: 0.725
key: train_roc_auc
value: [0.96296296 0.98148148 0.97222222 0.97685185 0.97222222 0.97222222
0.97685185 0.98148148 0.97685185 0.98611111]
mean value: 0.9759259259259259
key: test_jcc
value: [0.71428571 0.57894737 0.44444444 0.6 0.41176471 0.66666667
0.6875 0.57142857 0.76923077 0.47058824]
mean value: 0.5914856475653689
key: train_jcc
value: [0.92920354 0.96396396 0.94594595 0.95495495 0.94642857 0.94594595
0.95454545 0.96396396 0.95495495 0.97297297]
mean value: 0.9532880268499737
MCC on Blind test: 0.37
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02389264 0.00940037 0.00887752 0.0089426 0.00882745 0.00888252
0.00894308 0.00896382 0.00901222 0.00887132]
mean value: 0.01046135425567627
key: score_time
value: [0.01013374 0.00891423 0.00849485 0.0085175 0.00852847 0.00855064
0.00857592 0.00854993 0.00858283 0.00857258]
mean value: 0.008742070198059082
key: test_mcc
value: [0.43033148 0.41812101 0. 0.45834925 0.60246408 0.41812101
0.50709255 0.58536941 0.3380617 0.66666667]
mean value: 0.44245771459099875
key: train_mcc
value: [0.48685383 0.48557856 0.50557897 0.50425466 0.48685383 0.47568087
0.48557856 0.46812868 0.49554356 0.49693566]
mean value: 0.489098717880917
key: test_accuracy
value: [0.70833333 0.70833333 0.5 0.70833333 0.79166667 0.70833333
0.75 0.79166667 0.66666667 0.83333333]
mean value: 0.7166666666666667
key: train_accuracy
value: [0.74074074 0.74074074 0.75 0.75 0.74074074 0.73611111
0.74074074 0.73148148 0.74537037 0.74537037]
mean value: 0.7421296296296296
key: test_fscore
value: [0.74074074 0.72 0.57142857 0.75862069 0.81481481 0.72
0.76923077 0.8 0.63636364 0.83333333]
mean value: 0.7364532555567038
key: train_fscore
value: [0.75862069 0.75652174 0.76724138 0.76521739 0.75862069 0.7510917
0.75652174 0.75 0.76190476 0.7639485 ]
mean value: 0.7589688591001514
key: test_precision
value: [0.66666667 0.69230769 0.5 0.64705882 0.73333333 0.69230769
0.71428571 0.76923077 0.7 0.83333333]
mean value: 0.6948524024994613
key: train_precision
value: [0.70967742 0.71311475 0.71774194 0.72131148 0.70967742 0.7107438
0.71311475 0.7016129 0.71544715 0.712 ]
mean value: 0.712444161715035
key: test_recall
value: [0.83333333 0.75 0.66666667 0.91666667 0.91666667 0.75
0.83333333 0.83333333 0.58333333 0.83333333]
mean value: 0.7916666666666666
key: train_recall
value: [0.81481481 0.80555556 0.82407407 0.81481481 0.81481481 0.7962963
0.80555556 0.80555556 0.81481481 0.82407407]
mean value: 0.812037037037037
key: test_roc_auc
value: [0.70833333 0.70833333 0.5 0.70833333 0.79166667 0.70833333
0.75 0.79166667 0.66666667 0.83333333]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.74074074 0.74074074 0.75 0.75 0.74074074 0.73611111
0.74074074 0.73148148 0.74537037 0.74537037]
mean value: 0.7421296296296297
key: test_jcc
value: [0.58823529 0.5625 0.4 0.61111111 0.6875 0.5625
0.625 0.66666667 0.46666667 0.71428571]
mean value: 0.5884465452847806
key: train_jcc
value: [0.61111111 0.60839161 0.62237762 0.61971831 0.61111111 0.6013986
0.60839161 0.6 0.61538462 0.61805556]
mean value: 0.6115940143580989
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01207209 0.01655149 0.0160923 0.01483059 0.01510477 0.0145371
0.01643419 0.01679707 0.01524067 0.01558161]
mean value: 0.01532418727874756
key: score_time
value: [0.00867105 0.01097822 0.01092243 0.01148534 0.01145959 0.01148415
0.01148462 0.01149249 0.01149011 0.0115099 ]
mean value: 0.01109778881072998
key: test_mcc
value: [0.6761234 0.60246408 0.33333333 0.57735027 0.53033009 0.45834925
0.53033009 0.58536941 0.37796447 0.58536941]
mean value: 0.5256983789695563
key: train_mcc
value: [0.75124823 0.87996919 0.77093924 0.68511879 0.79848995 0.77013788
0.74779086 0.83462233 0.60587838 0.79473968]
mean value: 0.7638934525407249
key: test_accuracy
value: [0.83333333 0.79166667 0.66666667 0.75 0.75 0.70833333
0.75 0.79166667 0.625 0.79166667]
mean value: 0.7458333333333333
key: train_accuracy
value: [0.87037037 0.93981481 0.875 0.81944444 0.89814815 0.88425926
0.86111111 0.91666667 0.76851852 0.89351852]
mean value: 0.8726851851851851
key: test_fscore
value: [0.84615385 0.76190476 0.66666667 0.8 0.78571429 0.63157895
0.7 0.8 0.4 0.8 ]
mean value: 0.7192018507807981
key: train_fscore
value: [0.88034188 0.94063927 0.85863874 0.84705882 0.90178571 0.88038278
0.84042553 0.91428571 0.69879518 0.9004329 ]
mean value: 0.8662786533494914
key: test_precision
value: [0.78571429 0.88888889 0.66666667 0.66666667 0.6875 0.85714286
0.875 0.76923077 1. 0.76923077]
mean value: 0.7966040903540903
key: train_precision
value: [0.81746032 0.92792793 0.98795181 0.73469388 0.87068966 0.91089109
0.9875 0.94117647 1. 0.84552846]
mean value: 0.9023819600322295
key: test_recall
value: [0.91666667 0.66666667 0.66666667 1. 0.91666667 0.5
0.58333333 0.83333333 0.25 0.83333333]
mean value: 0.7166666666666667
key: train_recall
value: [0.9537037 0.9537037 0.75925926 1. 0.93518519 0.85185185
0.73148148 0.88888889 0.53703704 0.96296296]
mean value: 0.8574074074074074
key: test_roc_auc
value: [0.83333333 0.79166667 0.66666667 0.75 0.75 0.70833333
0.75 0.79166667 0.625 0.79166667]
mean value: 0.7458333333333333
key: train_roc_auc
value: [0.87037037 0.93981481 0.875 0.81944444 0.89814815 0.88425926
0.86111111 0.91666667 0.76851852 0.89351852]
mean value: 0.8726851851851851
key: test_jcc
value: [0.73333333 0.61538462 0.5 0.66666667 0.64705882 0.46153846
0.53846154 0.66666667 0.25 0.66666667]
mean value: 0.5745776772247361
key: train_jcc
value: [0.78625954 0.88793103 0.75229358 0.73469388 0.82113821 0.78632479
0.72477064 0.84210526 0.53703704 0.81889764]
mean value: 0.7691451609899106
MCC on Blind test: 0.42
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0141654 0.01597714 0.01418781 0.0147953 0.01495671 0.01431012
0.01488042 0.01638436 0.01453948 0.01479721]
mean value: 0.014899396896362304
key: score_time
value: [0.0115509 0.01146197 0.0115335 0.01147461 0.01151395 0.01156092
0.01154041 0.01152873 0.01148868 0.01163054]
mean value: 0.011528420448303222
key: test_mcc
value: [0.4472136 0.60246408 0.3380617 0.64168895 0.5 0.37796447
0.43033148 0.58536941 0.37796447 0.43033148]
mean value: 0.473138964023511
key: train_mcc
value: [0.4472136 0.86741806 0.68286996 0.65055321 0.86741806 0.45464702
0.83010976 0.83713046 0.53452248 0.6753356 ]
mean value: 0.684721819583476
key: test_accuracy
value: [0.66666667 0.79166667 0.66666667 0.79166667 0.75 0.625
0.70833333 0.79166667 0.625 0.70833333]
mean value: 0.7125
key: train_accuracy
value: [0.66666667 0.93055556 0.82407407 0.80092593 0.93055556 0.6712963
0.91203704 0.91203704 0.72222222 0.81944444]
mean value: 0.8189814814814815
key: test_fscore
value: [0.75 0.76190476 0.63636364 0.73684211 0.75 0.4
0.66666667 0.8 0.4 0.66666667]
mean value: 0.656844383686489
key: train_fscore
value: [0.75 0.93449782 0.79120879 0.75428571 0.92610837 0.51034483
0.90640394 0.91914894 0.61538462 0.78453039]
mean value: 0.7891913403240695
key: test_precision
value: [0.6 0.88888889 0.7 1. 0.75 1.
0.77777778 0.76923077 1. 0.77777778]
mean value: 0.8263675213675213
key: train_precision
value: [0.6 0.88429752 0.97297297 0.98507463 0.98947368 1.
0.96842105 0.8503937 1. 0.97260274]
mean value: 0.9223236297855336
key: test_recall
value: [1. 0.66666667 0.58333333 0.58333333 0.75 0.25
0.58333333 0.83333333 0.25 0.58333333]
mean value: 0.6083333333333334
key: train_recall
value: [1. 0.99074074 0.66666667 0.61111111 0.87037037 0.34259259
0.85185185 1. 0.44444444 0.65740741]
mean value: 0.7435185185185185
key: test_roc_auc
value: [0.66666667 0.79166667 0.66666667 0.79166667 0.75 0.625
0.70833333 0.79166667 0.625 0.70833333]
mean value: 0.7125
key: train_roc_auc
value: [0.66666667 0.93055556 0.82407407 0.80092593 0.93055556 0.6712963
0.91203704 0.91203704 0.72222222 0.81944444]
mean value: 0.8189814814814814
key: test_jcc
value: [0.6 0.61538462 0.46666667 0.58333333 0.6 0.25
0.5 0.66666667 0.25 0.5 ]
mean value: 0.5032051282051282
key: train_jcc
value: [0.6 0.87704918 0.65454545 0.60550459 0.86238532 0.34259259
0.82882883 0.8503937 0.44444444 0.64545455]
mean value: 0.6711198655238018
MCC on Blind test: 0.44
Accuracy on Blind test: 0.74
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.13095427 0.11557102 0.11594152 0.11598682 0.1163609 0.11597848
0.11679888 0.11590648 0.11655402 0.11645198]
mean value: 0.1176504373550415
key: score_time
value: [0.01498008 0.01493311 0.01499248 0.01484203 0.01487803 0.01490355
0.01487446 0.0148387 0.01498008 0.0148499 ]
mean value: 0.014907240867614746
key: test_mcc
value: [0.66666667 0.57735027 0.38490018 0.58536941 0.5 0.60246408
0.6761234 0.6761234 1. 0.2508726 ]
mean value: 0.591987000896547
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.75 0.66666667 0.79166667 0.75 0.79166667
0.83333333 0.83333333 1. 0.625 ]
mean value: 0.7875
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.66666667 0.73333333 0.7826087 0.75 0.76190476
0.84615385 0.81818182 1. 0.64 ]
mean value: 0.7832182455225933
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 1. 0.61111111 0.81818182 0.75 0.88888889
0.78571429 0.9 1. 0.61538462]
mean value: 0.8202614052614052
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.91666667 0.75 0.75 0.66666667
0.91666667 0.75 1. 0.66666667]
mean value: 0.775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.75 0.66666667 0.79166667 0.75 0.79166667
0.83333333 0.83333333 1. 0.625 ]
mean value: 0.7875
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.5 0.57894737 0.64285714 0.6 0.61538462
0.73333333 0.69230769 1. 0.47058824]
mean value: 0.6547704101883669
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04484487 0.05021477 0.06698537 0.0464921 0.05736089 0.04467273
0.04351902 0.06231117 0.06437397 0.0654459 ]
mean value: 0.05462207794189453
key: score_time
value: [0.03038597 0.02814126 0.02920938 0.03339005 0.02574015 0.02409744
0.02607036 0.02961445 0.0295229 0.02212024]
mean value: 0.0278292179107666
key: test_mcc
value: [0.6761234 0.51298918 0.60246408 0.53033009 0.60246408 0.45834925
0.50709255 0.41812101 0.91986621 0.66666667]
mean value: 0.5894466501897947
key: train_mcc
value: [0.95407186 0.95472741 0.9459053 0.96312812 0.99078321 0.97259753
0.96362411 0.98164982 0.95407186 0.98148148]
mean value: 0.9662040699138664
key: test_accuracy
value: [0.83333333 0.70833333 0.79166667 0.75 0.79166667 0.70833333
0.75 0.70833333 0.95833333 0.83333333]
mean value: 0.7833333333333333
key: train_accuracy
value: [0.97685185 0.97685185 0.97222222 0.98148148 0.99537037 0.98611111
0.98148148 0.99074074 0.97685185 0.99074074]
mean value: 0.9828703703703704
key: test_fscore
value: [0.81818182 0.58823529 0.81481481 0.7 0.76190476 0.63157895
0.72727273 0.69565217 0.95652174 0.83333333]
mean value: 0.7527495610037002
key: train_fscore
value: [0.97652582 0.97630332 0.97142857 0.98130841 0.99534884 0.98591549
0.98113208 0.99065421 0.97652582 0.99074074]
mean value: 0.9825883295358523
key: test_precision
value: [0.9 1. 0.73333333 0.875 0.88888889 0.85714286
0.8 0.72727273 1. 0.83333333]
mean value: 0.861497113997114
key: train_precision
value: [0.99047619 1. 1. 0.99056604 1. 1.
1. 1. 0.99047619 0.99074074]
mean value: 0.9962259159428971
key: test_recall
value: [0.75 0.41666667 0.91666667 0.58333333 0.66666667 0.5
0.66666667 0.66666667 0.91666667 0.83333333]
mean value: 0.6916666666666667
key: train_recall
value: [0.96296296 0.9537037 0.94444444 0.97222222 0.99074074 0.97222222
0.96296296 0.98148148 0.96296296 0.99074074]
mean value: 0.9694444444444444
key: test_roc_auc
value: [0.83333333 0.70833333 0.79166667 0.75 0.79166667 0.70833333
0.75 0.70833333 0.95833333 0.83333333]
mean value: 0.7833333333333333
key: train_roc_auc
value: [0.97685185 0.97685185 0.97222222 0.98148148 0.99537037 0.98611111
0.98148148 0.99074074 0.97685185 0.99074074]
mean value: 0.9828703703703703
key: test_jcc
value: [0.69230769 0.41666667 0.6875 0.53846154 0.61538462 0.46153846
0.57142857 0.53333333 0.91666667 0.71428571]
mean value: 0.614757326007326
key: train_jcc
value: [0.95412844 0.9537037 0.94444444 0.96330275 0.99074074 0.97222222
0.96296296 0.98148148 0.95412844 0.98165138]
mean value: 0.9658766564729867
MCC on Blind test: 0.44
Accuracy on Blind test: 0.72
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.11546874 0.1362741 0.08421946 0.07674527 0.0547297 0.06163383
0.07058072 0.04774594 0.06891847 0.0557971 ]
mean value: 0.07721133232116699
key: score_time
value: [0.02076721 0.03748512 0.02270293 0.01903296 0.02165675 0.02445173
0.01773858 0.02347541 0.02349663 0.02122188]
mean value: 0.023202919960021974
key: test_mcc
value: [0.66666667 0.41812101 0. 0.5 0.3380617 0.58536941
0.25819889 0.43033148 0.6761234 0.5 ]
mean value: 0.4372872557008492
key: train_mcc
value: [0.99078321 0.99078321 0.99078321 0.99078321 0.99078321 0.99078321
0.99078321 0.99078321 0.98164982 0.99078321]
mean value: 0.9898698738684077
key: test_accuracy
value: [0.83333333 0.70833333 0.5 0.75 0.66666667 0.79166667
0.625 0.70833333 0.83333333 0.75 ]
mean value: 0.7166666666666667
key: train_accuracy
value: [0.99537037 0.99537037 0.99537037 0.99537037 0.99537037 0.99537037
0.99537037 0.99537037 0.99074074 0.99537037]
mean value: 0.9949074074074074
key: test_fscore
value: [0.83333333 0.69565217 0.57142857 0.75 0.69230769 0.8
0.66666667 0.74074074 0.84615385 0.75 ]
mean value: 0.7346283024543894
key: train_fscore
value: [0.99539171 0.99539171 0.99539171 0.99539171 0.99539171 0.99539171
0.99539171 0.99539171 0.99082569 0.99539171]
mean value: 0.9949351033695515
key: test_precision
value: [0.83333333 0.72727273 0.5 0.75 0.64285714 0.76923077
0.6 0.66666667 0.78571429 0.75 ]
mean value: 0.7025074925074926
key: train_precision
value: [0.99082569 0.99082569 0.99082569 0.99082569 0.99082569 0.99082569
0.99082569 0.99082569 0.98181818 0.99082569]
mean value: 0.9899249374478732
key: test_recall
value: [0.83333333 0.66666667 0.66666667 0.75 0.75 0.83333333
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.70833333 0.5 0.75 0.66666667 0.79166667
0.625 0.70833333 0.83333333 0.75 ]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.99537037 0.99537037 0.99537037 0.99537037 0.99537037 0.99537037
0.99537037 0.99537037 0.99074074 0.99537037]
mean value: 0.9949074074074074
key: test_jcc
value: [0.71428571 0.53333333 0.4 0.6 0.52941176 0.66666667
0.5 0.58823529 0.73333333 0.6 ]
mean value: 0.5865266106442577
key: train_jcc
value: [0.99082569 0.99082569 0.99082569 0.99082569 0.99082569 0.99082569
0.99082569 0.99082569 0.98181818 0.99082569]
mean value: 0.9899249374478732
MCC on Blind test: 0.16
Accuracy on Blind test: 0.59
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.39657617 0.38660288 0.38711214 0.38470411 0.39132905 0.38640523
0.3876338 0.38904047 0.39087391 0.39372563]
mean value: 0.3894003391265869
key: score_time
value: [0.00990486 0.00941515 0.00925326 0.00939202 0.0096159 0.00942993
0.00955319 0.0091846 0.00959134 0.00947499]
mean value: 0.009481525421142578
key: test_mcc
value: [0.75261781 0.45834925 0.45834925 0.91986621 0.6761234 0.6761234
0.58536941 0.66666667 0.75261781 0.33333333]
mean value: 0.6279416540619365
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.70833333 0.70833333 0.95833333 0.83333333 0.83333333
0.79166667 0.83333333 0.875 0.66666667]
mean value: 0.8083333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88 0.63157895 0.75862069 0.96 0.84615385 0.81818182
0.7826087 0.83333333 0.88 0.66666667]
mean value: 0.8057143997011431
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.84615385 0.85714286 0.64705882 0.92307692 0.78571429 0.9
0.81818182 0.83333333 0.84615385 0.66666667]
mean value: 0.8123482399952988
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91666667 0.5 0.91666667 1. 0.91666667 0.75
0.75 0.83333333 0.91666667 0.66666667]
mean value: 0.8166666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.70833333 0.70833333 0.95833333 0.83333333 0.83333333
0.79166667 0.83333333 0.875 0.66666667]
mean value: 0.8083333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.78571429 0.46153846 0.61111111 0.92307692 0.73333333 0.69230769
0.64285714 0.71428571 0.78571429 0.5 ]
mean value: 0.684993894993895
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02040315 0.02058983 0.0212307 0.02100587 0.02085805 0.02074647
0.04295802 0.02070928 0.0207932 0.02259588]
mean value: 0.02318904399871826
key: score_time
value: [0.01255536 0.01628447 0.01540375 0.01492286 0.01830244 0.0138433
0.01333618 0.01456308 0.01523161 0.01233673]
mean value: 0.014677977561950684
key: test_mcc
value: [ 0.75261781 0.58536941 -0.0860663 0.43033148 0.2508726 0.50709255
0.25819889 0.33333333 0.43033148 0.16903085]
mean value: 0.36311121151182635
key: train_mcc
value: [1. 0.96362411 0.95472741 1. 0.86135677 0.99078321
0.92847669 0.99078321 0.90284331 0.9459053 ]
mean value: 0.9538500024738616
key: test_accuracy
value: [0.875 0.79166667 0.45833333 0.70833333 0.625 0.75
0.625 0.66666667 0.70833333 0.58333333]
mean value: 0.6791666666666667
key: train_accuracy
value: [1. 0.98148148 0.97685185 1. 0.92592593 0.99537037
0.96296296 0.99537037 0.94907407 0.97222222]
mean value: 0.9759259259259259
key: test_fscore
value: [0.88 0.7826087 0.51851852 0.74074074 0.60869565 0.76923077
0.66666667 0.66666667 0.74074074 0.61538462]
mean value: 0.6989253065774804
key: train_fscore
value: [1. 0.98181818 0.97737557 1. 0.93103448 0.99539171
0.96428571 0.99539171 0.95154185 0.97297297]
mean value: 0.9769812177804863
key: test_precision
value: [0.84615385 0.81818182 0.46666667 0.66666667 0.63636364 0.71428571
0.6 0.66666667 0.66666667 0.57142857]
mean value: 0.6653080253080252
key: train_precision
value: [1. 0.96428571 0.95575221 1. 0.87096774 0.99082569
0.93103448 0.99082569 0.90756303 0.94736842]
mean value: 0.9558622973778704
key: test_recall
value: [0.91666667 0.75 0.58333333 0.83333333 0.58333333 0.83333333
0.75 0.66666667 0.83333333 0.66666667]
mean value: 0.7416666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.79166667 0.45833333 0.70833333 0.625 0.75
0.625 0.66666667 0.70833333 0.58333333]
mean value: 0.6791666666666667
key: train_roc_auc
value: [1. 0.98148148 0.97685185 1. 0.92592593 0.99537037
0.96296296 0.99537037 0.94907407 0.97222222]
mean value: 0.975925925925926
key: test_jcc
value: [0.78571429 0.64285714 0.35 0.58823529 0.4375 0.625
0.5 0.5 0.58823529 0.44444444]
mean value: 0.5461986461251167
key: train_jcc
value: [1. 0.96428571 0.95575221 1. 0.87096774 0.99082569
0.93103448 0.99082569 0.90756303 0.94736842]
mean value: 0.9558622973778704
MCC on Blind test: 0.08
Accuracy on Blind test: 0.55
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01451707 0.01430178 0.01430941 0.01420307 0.0142262 0.02070832
0.04261613 0.0417614 0.04505873 0.03498101]
mean value: 0.02566831111907959
key: score_time
value: [0.01182199 0.01183081 0.01353741 0.01192141 0.01188087 0.01182985
0.0209434 0.02043557 0.02949667 0.02229738]
mean value: 0.016599535942077637
key: test_mcc
value: [0.75261781 0.60246408 0.45834925 0.41812101 0.41812101 0.58536941
0.33333333 0.5 0.91986621 0.58536941]
mean value: 0.5573611501955348
key: train_mcc
value: [0.84291786 0.88949918 0.90756304 0.87171665 0.86203543 0.87996919
0.89849486 0.89026381 0.86203543 0.90803041]
mean value: 0.8812525853248737
key: test_accuracy
value: [0.875 0.79166667 0.70833333 0.70833333 0.70833333 0.79166667
0.66666667 0.75 0.95833333 0.79166667]
mean value: 0.775
key: train_accuracy
value: [0.9212963 0.94444444 0.9537037 0.93518519 0.93055556 0.93981481
0.94907407 0.94444444 0.93055556 0.9537037 ]
mean value: 0.9402777777777778
key: test_fscore
value: [0.86956522 0.76190476 0.75862069 0.72 0.72 0.7826087
0.66666667 0.75 0.96 0.7826087 ]
mean value: 0.7771974726922253
key: train_fscore
value: [0.92237443 0.94545455 0.95412844 0.93693694 0.9321267 0.94063927
0.94977169 0.94594595 0.9321267 0.95454545]
mean value: 0.9414050105042868
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
key: test_precision
value: [0.90909091 0.88888889 0.64705882 0.69230769 0.69230769 0.81818182
0.66666667 0.75 0.92307692 0.81818182]
mean value: 0.780576123223182
key: train_precision
value: [0.90990991 0.92857143 0.94545455 0.9122807 0.91150442 0.92792793
0.93693694 0.92105263 0.91150442 0.9375 ]
mean value: 0.9242642931691604
key: test_recall
value: [0.83333333 0.66666667 0.91666667 0.75 0.75 0.75
0.66666667 0.75 1. 0.75 ]
mean value: 0.7833333333333333
key: train_recall
value: [0.93518519 0.96296296 0.96296296 0.96296296 0.9537037 0.9537037
0.96296296 0.97222222 0.9537037 0.97222222]
mean value: 0.9592592592592593
key: test_roc_auc
value: [0.875 0.79166667 0.70833333 0.70833333 0.70833333 0.79166667
0.66666667 0.75 0.95833333 0.79166667]
mean value: 0.775
key: train_roc_auc
value: [0.9212963 0.94444444 0.9537037 0.93518519 0.93055556 0.93981481
0.94907407 0.94444444 0.93055556 0.9537037 ]
mean value: 0.9402777777777778
key: test_jcc
value: [0.76923077 0.61538462 0.61111111 0.5625 0.5625 0.64285714
0.5 0.6 0.92307692 0.64285714]
mean value: 0.6429517704517704
key: train_jcc
value: [0.8559322 0.89655172 0.9122807 0.88135593 0.87288136 0.88793103
0.90434783 0.8974359 0.87288136 0.91304348]
mean value: 0.8894641509616427
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.12958741 0.2245791 0.22601509 0.22674584 0.22733378 0.2300446
0.22894859 0.34019399 0.22946429 0.22652197]
mean value: 0.22894346714019775
key: score_time
value: [0.03218746 0.02195168 0.0238595 0.02112699 0.02143717 0.02339196
0.02224898 0.02193046 0.02027369 0.02168155]
mean value: 0.02300894260406494
key: test_mcc
value: [0.75261781 0.6761234 0.64168895 0.58536941 0.60246408 0.60246408
0.5 0.58536941 0.75261781 0.66666667]
mean value: 0.6365381602545337
key: train_mcc
value: [0.74704394 0.80836728 0.78113347 0.75158034 0.75158034 0.76253505
0.77898084 0.77120096 0.77013788 0.75261781]
mean value: 0.7675177902273471
key: test_accuracy
value: [0.875 0.83333333 0.79166667 0.79166667 0.79166667 0.79166667
0.75 0.79166667 0.875 0.83333333]
mean value: 0.8125
key: train_accuracy
value: [0.87037037 0.90277778 0.88888889 0.875 0.875 0.87962963
0.88888889 0.88425926 0.88425926 0.875 ]
mean value: 0.8824074074074074
key: test_fscore
value: [0.88 0.81818182 0.82758621 0.8 0.81481481 0.76190476
0.75 0.8 0.88 0.83333333]
mean value: 0.816582093513128
key: train_fscore
value: [0.87826087 0.90666667 0.89380531 0.87892377 0.87892377 0.88495575
0.89189189 0.88888889 0.88789238 0.88 ]
mean value: 0.8870209289273469
key: test_precision
value: [0.84615385 0.9 0.70588235 0.76923077 0.73333333 0.88888889
0.75 0.76923077 0.84615385 0.83333333]
mean value: 0.8042207139265963
key: train_precision
value: [0.82786885 0.87179487 0.8559322 0.85217391 0.85217391 0.84745763
0.86842105 0.85470085 0.86086957 0.84615385]
mean value: 0.853754669955299
key: test_recall
value: [0.91666667 0.75 1. 0.83333333 0.91666667 0.66666667
0.75 0.83333333 0.91666667 0.83333333]
mean value: 0.8416666666666667
key: train_recall
value: [0.93518519 0.94444444 0.93518519 0.90740741 0.90740741 0.92592593
0.91666667 0.92592593 0.91666667 0.91666667]
mean value: 0.9231481481481482
key: test_roc_auc
value: [0.875 0.83333333 0.79166667 0.79166667 0.79166667 0.79166667
0.75 0.79166667 0.875 0.83333333]
mean value: 0.8125
key: train_roc_auc
value: [0.87037037 0.90277778 0.88888889 0.875 0.875 0.87962963
0.88888889 0.88425926 0.88425926 0.875 ]
mean value: 0.8824074074074074
key: test_jcc
value: [0.78571429 0.69230769 0.70588235 0.66666667 0.6875 0.61538462
0.6 0.66666667 0.78571429 0.71428571]
mean value: 0.6920122279681103
key: train_jcc
value: [0.78294574 0.82926829 0.808 0.784 0.784 0.79365079
0.80487805 0.8 0.7983871 0.78571429]
mean value: 0.7970844254036796
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03095245 0.02517986 0.03043199 0.0321815 0.03195596 0.03187084
0.03063774 0.03242993 0.02801824 0.02898526]
mean value: 0.03026437759399414
key: score_time
value: [0.01542759 0.01177216 0.01163268 0.01346421 0.01373935 0.01198483
0.01169944 0.01361442 0.01173401 0.01166964]
mean value: 0.01267383098602295
key: test_mcc
value: [0.33333333 0.6761234 0.58536941 0.75261781 0.60246408 0.43033148
0.43033148 0.75261781 0.66414149 0.56490196]
mean value: 0.5792232247346277
key: train_mcc
value: [0.82419551 0.8054638 0.80659666 0.79494819 0.82275335 0.82332931
0.80659666 0.77911093 0.80493517 0.77897523]
mean value: 0.8046904809155915
key: test_accuracy
value: [0.66666667 0.83333333 0.79166667 0.875 0.79166667 0.70833333
0.70833333 0.875 0.82608696 0.7826087 ]
mean value: 0.7858695652173913
key: train_accuracy
value: [0.91121495 0.90186916 0.90186916 0.89719626 0.91121495 0.91121495
0.90186916 0.88785047 0.90232558 0.88837209]
mean value: 0.9014996739839165
key: test_fscore
value: [0.66666667 0.81818182 0.8 0.86956522 0.81481481 0.66666667
0.74074074 0.88 0.83333333 0.8 ]
mean value: 0.7889969257795345
key: train_fscore
value: [0.91402715 0.90497738 0.9058296 0.89908257 0.9124424 0.91324201
0.9058296 0.89285714 0.90410959 0.89189189]
mean value: 0.9044289315755244
key: test_precision
value: [0.66666667 0.9 0.76923077 0.90909091 0.73333333 0.77777778
0.66666667 0.84615385 0.76923077 0.76923077]
mean value: 0.7807381507381508
key: train_precision
value: [0.88596491 0.87719298 0.87068966 0.88288288 0.9 0.89285714
0.87068966 0.85470085 0.89189189 0.86086957]
mean value: 0.8787739542631834
key: test_recall
value: [0.66666667 0.75 0.83333333 0.83333333 0.91666667 0.58333333
0.83333333 0.91666667 0.90909091 0.83333333]
mean value: 0.8075757575757576
key: train_recall
value: [0.94392523 0.93457944 0.94392523 0.91588785 0.92523364 0.93457944
0.94392523 0.93457944 0.91666667 0.92523364]
mean value: 0.9318535825545171
key: test_roc_auc
value: [0.66666667 0.83333333 0.79166667 0.875 0.79166667 0.70833333
0.70833333 0.875 0.82954545 0.78030303]
mean value: 0.7859848484848485
key: train_roc_auc
value: [0.91121495 0.90186916 0.90186916 0.89719626 0.91121495 0.91121495
0.90186916 0.88785047 0.90225857 0.88854275]
mean value: 0.9015100380754586
key: test_jcc
value: [0.5 0.69230769 0.66666667 0.76923077 0.6875 0.5
0.58823529 0.78571429 0.71428571 0.66666667]
mean value: 0.6570607088989442
key: train_jcc
value: [0.84166667 0.82644628 0.82786885 0.81666667 0.83898305 0.84033613
0.82786885 0.80645161 0.825 0.80487805]
mean value: 0.8256166166228054
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.8838706 0.72082353 0.72018147 0.89305925 0.71959066 0.72031975
0.79101253 0.71640468 0.70504045 0.91068506]
mean value: 0.7780987977981567
key: score_time
value: [0.01200795 0.01196694 0.01197743 0.01194954 0.01201367 0.01194072
0.01195097 0.01195168 0.01200342 0.01209617]
mean value: 0.011985850334167481
key: test_mcc
value: [0.41812101 0.66666667 0.58536941 0.84515425 0.60246408 0.43033148
0.53033009 0.75261781 0.48856385 0.56490196]
mean value: 0.5884520595841621
key: train_mcc
value: [0.76181538 0.75032247 0.77043718 0.68331814 0.68415789 0.76800037
0.76033717 0.75164603 0.70417011 0.77022946]
mean value: 0.7404434189570132
key: test_accuracy
value: [0.70833333 0.83333333 0.79166667 0.91666667 0.79166667 0.70833333
0.75 0.875 0.73913043 0.7826087 ]
mean value: 0.7896739130434782
key: train_accuracy
value: [0.87850467 0.87383178 0.88317757 0.8411215 0.8411215 0.88317757
0.87850467 0.87383178 0.85116279 0.88372093]
mean value: 0.8688154748967616
key: test_fscore
value: [0.72 0.83333333 0.8 0.92307692 0.81481481 0.66666667
0.78571429 0.88 0.75 0.8 ]
mean value: 0.7973606023606024
key: train_fscore
value: [0.88495575 0.87892377 0.88888889 0.84545455 0.84684685 0.88687783
0.88392857 0.88 0.85714286 0.88789238]
mean value: 0.8740911433526155
key: test_precision
value: [0.69230769 0.83333333 0.76923077 0.85714286 0.73333333 0.77777778
0.6875 0.84615385 0.69230769 0.76923077]
mean value: 0.7658318070818071
key: train_precision
value: [0.84033613 0.84482759 0.84745763 0.82300885 0.8173913 0.85964912
0.84615385 0.83898305 0.82758621 0.85344828]
mean value: 0.8398842004251612
key: test_recall
value: [0.75 0.83333333 0.83333333 1. 0.91666667 0.58333333
0.91666667 0.91666667 0.81818182 0.83333333]
mean value: 0.8401515151515152
key: train_recall
value: [0.93457944 0.91588785 0.93457944 0.86915888 0.87850467 0.91588785
0.92523364 0.92523364 0.88888889 0.92523364]
mean value: 0.911318795430945
key: test_roc_auc
value: [0.70833333 0.83333333 0.79166667 0.91666667 0.79166667 0.70833333
0.75 0.875 0.74242424 0.78030303]
mean value: 0.7897727272727273
key: train_roc_auc
value: [0.87850467 0.87383178 0.88317757 0.8411215 0.8411215 0.88317757
0.87850467 0.87383178 0.8509865 0.88391312]
mean value: 0.8688170647282797
key: test_jcc
value: [0.5625 0.71428571 0.66666667 0.85714286 0.6875 0.5
0.64705882 0.78571429 0.6 0.66666667]
mean value: 0.6687535014005602
key: train_jcc
value: [0.79365079 0.784 0.8 0.73228346 0.734375 0.79674797
0.792 0.78571429 0.75 0.7983871 ]
mean value: 0.7767158608185877
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01436901 0.01197767 0.00945544 0.00957227 0.00921988 0.00907397
0.0101347 0.00925565 0.00901246 0.00902224]
mean value: 0.010109329223632812
key: score_time
value: [0.01210284 0.00923228 0.00915217 0.00906968 0.00897646 0.00888538
0.00887156 0.00883389 0.00879741 0.00943661]
mean value: 0.009335827827453614
key: test_mcc
value: [0.45834925 0.35355339 0.58536941 0.57735027 0.57735027 0.58536941
0.09166985 0.51298918 0.44411739 0.42228828]
mean value: 0.4608406691502399
key: train_mcc
value: [0.4856668 0.51639778 0.51822739 0.52240206 0.50180978 0.46360045
0.51088537 0.50987255 0.51938062 0.49694198]
mean value: 0.5045184769946628
key: test_accuracy
value: [0.70833333 0.66666667 0.79166667 0.75 0.75 0.79166667
0.54166667 0.70833333 0.69565217 0.69565217]
mean value: 0.709963768115942
key: train_accuracy
value: [0.72429907 0.72429907 0.73831776 0.74766355 0.73364486 0.72429907
0.74299065 0.73831776 0.74418605 0.73023256]
mean value: 0.7348250380352097
key: test_fscore
value: [0.75862069 0.71428571 0.8 0.8 0.8 0.7826087
0.62068966 0.77419355 0.74074074 0.75862069]
mean value: 0.7549759733548485
key: train_fscore
value: [0.76862745 0.77902622 0.78125 0.78225806 0.77470356 0.75518672
0.77732794 0.77777778 0.7826087 0.77165354]
mean value: 0.7750419963988651
key: test_precision
value: [0.64705882 0.625 0.76923077 0.66666667 0.66666667 0.81818182
0.52941176 0.63157895 0.625 0.64705882]
mean value: 0.6625854279879048
key: train_precision
value: [0.66216216 0.65 0.67114094 0.68794326 0.67123288 0.67910448
0.68571429 0.67586207 0.68275862 0.66666667]
mean value: 0.6732585360531219
key: test_recall
value: [0.91666667 0.83333333 0.83333333 1. 1. 0.75
0.75 1. 0.90909091 0.91666667]
mean value: 0.8909090909090909
key: train_recall
value: [0.91588785 0.97196262 0.93457944 0.90654206 0.91588785 0.85046729
0.89719626 0.91588785 0.91666667 0.91588785]
mean value: 0.9140965732087227
key: test_roc_auc
value: [0.70833333 0.66666667 0.79166667 0.75 0.75 0.79166667
0.54166667 0.70833333 0.70454545 0.68560606]
mean value: 0.7098484848484848
key: train_roc_auc
value: [0.72429907 0.72429907 0.73831776 0.74766355 0.73364486 0.72429907
0.74299065 0.73831776 0.74338006 0.73109207]
mean value: 0.7348303911388023
key: test_jcc
value: [0.61111111 0.55555556 0.66666667 0.66666667 0.66666667 0.64285714
0.45 0.63157895 0.58823529 0.61111111]
mean value: 0.6090449162120989
key: train_jcc
value: [0.62420382 0.63803681 0.64102564 0.64238411 0.63225806 0.60666667
0.63576159 0.63636364 0.64285714 0.62820513]
mean value: 0.6327762606470584
MCC on Blind test: 0.21
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00941443 0.00931597 0.00942731 0.00927639 0.0095489 0.01009607
0.00941873 0.00929666 0.00949907 0.00945783]
mean value: 0.009475135803222656
key: score_time
value: [0.00885749 0.00908804 0.00896859 0.00891495 0.00882578 0.0092988
0.008847 0.00886583 0.00917006 0.00905085]
mean value: 0.008988738059997559
key: test_mcc
value: [0.41812101 0.16903085 0.58536941 0.58536941 0.2508726 0.58536941
0.5 0.50709255 0.76764947 0.50168817]
mean value: 0.4870562873788939
key: train_mcc
value: [0.59043763 0.63620901 0.62792574 0.61814664 0.65561007 0.62010797
0.63889912 0.63328843 0.60964859 0.64701307]
mean value: 0.6277286276584018
key: test_accuracy
value: [0.70833333 0.58333333 0.79166667 0.79166667 0.625 0.79166667
0.75 0.75 0.86956522 0.73913043]
mean value: 0.740036231884058
key: train_accuracy
value: [0.79439252 0.81775701 0.81308411 0.80841121 0.8271028 0.80841121
0.81775701 0.81308411 0.80465116 0.82325581]
mean value: 0.8127906976744186
key: test_fscore
value: [0.72 0.54545455 0.7826087 0.8 0.60869565 0.7826087
0.75 0.76923077 0.88 0.78571429]
mean value: 0.7424312643877861
key: train_fscore
value: [0.8018018 0.82191781 0.81981982 0.81447964 0.83257919 0.81777778
0.82666667 0.82608696 0.80909091 0.82568807]
mean value: 0.8195908636821799
key: test_precision
value: [0.69230769 0.6 0.81818182 0.76923077 0.63636364 0.81818182
0.75 0.71428571 0.78571429 0.6875 ]
mean value: 0.7271765734265734
key: train_precision
value: [0.77391304 0.80357143 0.79130435 0.78947368 0.80701754 0.77966102
0.78813559 0.77235772 0.79464286 0.81081081]
mean value: 0.7910888049646347
key: test_recall
value: [0.75 0.5 0.75 0.83333333 0.58333333 0.75
0.75 0.83333333 1. 0.91666667]
mean value: 0.7666666666666667
key: train_recall
value: [0.8317757 0.8411215 0.85046729 0.8411215 0.85981308 0.85981308
0.86915888 0.88785047 0.82407407 0.8411215 ]
mean value: 0.8506317064728279
key: test_roc_auc
value: [0.70833333 0.58333333 0.79166667 0.79166667 0.625 0.79166667
0.75 0.75 0.875 0.73106061]
mean value: 0.7397727272727274
key: train_roc_auc
value: [0.79439252 0.81775701 0.81308411 0.80841121 0.8271028 0.80841121
0.81775701 0.81308411 0.8045604 0.82333853]
mean value: 0.8127898926964348
key: test_jcc
value: [0.5625 0.375 0.64285714 0.66666667 0.4375 0.64285714
0.6 0.625 0.78571429 0.64705882]
mean value: 0.598515406162465
key: train_jcc
value: [0.66917293 0.69767442 0.69465649 0.6870229 0.71317829 0.69172932
0.70454545 0.7037037 0.67938931 0.703125 ]
mean value: 0.6944197829356628
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00901866 0.00923586 0.00900841 0.00987458 0.00992274 0.00976443
0.00986457 0.01014185 0.00996709 0.0086441 ]
mean value: 0.00954422950744629
key: score_time
value: [0.01099586 0.01051021 0.01014256 0.01094532 0.01093531 0.01087236
0.0109477 0.01107359 0.01543617 0.01557159]
mean value: 0.01174306869506836
key: test_mcc
value: [0.25819889 0.58536941 0.3380617 0.16903085 0.16903085 0.60246408
0.25819889 0.5 0.31298622 0.12878788]
mean value: 0.33221287636298225
key: train_mcc
value: [0.62297427 0.63014358 0.61814664 0.63889912 0.57503685 0.59389052
0.6377741 0.59252307 0.63165773 0.62203998]
mean value: 0.6163085857672724
key: test_accuracy
value: [0.625 0.79166667 0.66666667 0.58333333 0.58333333 0.79166667
0.625 0.75 0.65217391 0.56521739]
mean value: 0.6634057971014493
key: train_accuracy
value: [0.80841121 0.81308411 0.80841121 0.81775701 0.78504673 0.79439252
0.81775701 0.79439252 0.81395349 0.80930233]
mean value: 0.8062508150402087
key: test_fscore
value: [0.66666667 0.8 0.69230769 0.61538462 0.61538462 0.76190476
0.66666667 0.75 0.66666667 0.58333333]
mean value: 0.6818315018315018
key: train_fscore
value: [0.8209607 0.82300885 0.81447964 0.82666667 0.79824561 0.80701754
0.82511211 0.80530973 0.8245614 0.81777778]
mean value: 0.8163140034241074
key: test_precision
value: [0.6 0.76923077 0.64285714 0.57142857 0.57142857 0.88888889
0.6 0.75 0.61538462 0.58333333]
mean value: 0.6592551892551892
key: train_precision
value: [0.7704918 0.78151261 0.78947368 0.78813559 0.75206612 0.76033058
0.79310345 0.76470588 0.78333333 0.77966102]
mean value: 0.7762814060877736
key: test_recall
value: [0.75 0.83333333 0.75 0.66666667 0.66666667 0.66666667
0.75 0.75 0.72727273 0.58333333]
mean value: 0.7143939393939394
key: train_recall
value: [0.87850467 0.86915888 0.8411215 0.86915888 0.85046729 0.85981308
0.85981308 0.85046729 0.87037037 0.85981308]
mean value: 0.8608688127379716
key: test_roc_auc
value: [0.625 0.79166667 0.66666667 0.58333333 0.58333333 0.79166667
0.625 0.75 0.65530303 0.56439394]
mean value: 0.6636363636363636
key: train_roc_auc
value: [0.80841121 0.81308411 0.80841121 0.81775701 0.78504673 0.79439252
0.81775701 0.79439252 0.81368986 0.80953617]
mean value: 0.8062478366216683
key: test_jcc
value: [0.5 0.66666667 0.52941176 0.44444444 0.44444444 0.61538462
0.5 0.6 0.5 0.41176471]
mean value: 0.5212116641528406
key: train_jcc
value: [0.6962963 0.69924812 0.6870229 0.70454545 0.66423358 0.67647059
0.70229008 0.67407407 0.70149254 0.69172932]
mean value: 0.6897402947815147
MCC on Blind test: 0.07
Accuracy on Blind test: 0.55
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01391244 0.01355982 0.01375175 0.01338625 0.01189446 0.01187778
0.01245689 0.0119462 0.01181173 0.0138557 ]
mean value: 0.012845301628112793
key: score_time
value: [0.01098967 0.01039457 0.0104115 0.01010585 0.01013899 0.00966024
0.00945163 0.00955868 0.00956511 0.0103538 ]
mean value: 0.01006300449371338
key: test_mcc
value: [0.50709255 0.58536941 0.6761234 0.58536941 0.60246408 0.58536941
0.43033148 0.6761234 0.65151515 0.50168817]
mean value: 0.5801446459328247
key: train_mcc
value: [0.78691547 0.78452148 0.79943589 0.77911093 0.82419551 0.79943589
0.80801948 0.77399833 0.81614982 0.77323619]
mean value: 0.7945018986838522
key: test_accuracy
value: [0.75 0.79166667 0.83333333 0.79166667 0.79166667 0.79166667
0.70833333 0.83333333 0.82608696 0.73913043]
mean value: 0.7856884057971014
key: train_accuracy
value: [0.88785047 0.88785047 0.89719626 0.88785047 0.91121495 0.89719626
0.90186916 0.88317757 0.90697674 0.88372093]
mean value: 0.894490328189524
key: test_fscore
value: [0.76923077 0.8 0.84615385 0.7826087 0.81481481 0.7826087
0.74074074 0.84615385 0.81818182 0.78571429]
mean value: 0.7986207512294469
key: train_fscore
value: [0.89655172 0.89565217 0.90265487 0.89285714 0.91402715 0.90265487
0.90666667 0.89082969 0.91071429 0.88986784]
mean value: 0.9002476412856446
key: test_precision
value: [0.71428571 0.76923077 0.78571429 0.81818182 0.73333333 0.81818182
0.66666667 0.78571429 0.81818182 0.6875 ]
mean value: 0.759699050949051
key: train_precision
value: [0.832 0.83739837 0.85714286 0.85470085 0.88596491 0.85714286
0.86440678 0.83606557 0.87931034 0.84166667]
mean value: 0.8545799220176772
key: test_recall
value: [0.83333333 0.83333333 0.91666667 0.75 0.91666667 0.75
0.83333333 0.91666667 0.81818182 0.91666667]
mean value: 0.8484848484848485
key: train_recall
value: [0.97196262 0.96261682 0.95327103 0.93457944 0.94392523 0.95327103
0.95327103 0.95327103 0.94444444 0.94392523]
mean value: 0.9514537902388369
key: test_roc_auc
value: [0.75 0.79166667 0.83333333 0.79166667 0.79166667 0.79166667
0.70833333 0.83333333 0.82575758 0.73106061]
mean value: 0.7848484848484849
key: train_roc_auc
value: [0.88785047 0.88785047 0.89719626 0.88785047 0.91121495 0.89719626
0.90186916 0.88317757 0.90680166 0.88399965]
mean value: 0.8945006922810661
key: test_jcc
value: [0.625 0.66666667 0.73333333 0.64285714 0.6875 0.64285714
0.58823529 0.73333333 0.69230769 0.64705882]
mean value: 0.665914942900237
key: train_jcc
value: [0.8125 0.81102362 0.82258065 0.80645161 0.84166667 0.82258065
0.82926829 0.80314961 0.83606557 0.8015873 ]
mean value: 0.818687396627965
MCC on Blind test: 0.3
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.98078847 1.14474893 1.03964257 1.1282227 1.03296232 0.8334837
0.89926815 1.07796717 0.77976942 0.66169286]
mean value: 0.9578546285629272
key: score_time
value: [0.01240253 0.01251841 0.01240301 0.01245475 0.01492858 0.01274514
0.01272035 0.01467299 0.01266956 0.01264405]
mean value: 0.013015937805175782
key: test_mcc
value: [0.2508726 0.53033009 0.53033009 0.66666667 0.50709255 0.50709255
0.45834925 0.83333333 0.66414149 0.56818182]
mean value: 0.5516390435152891
key: train_mcc
value: [0.95431352 0.98130841 0.91914503 0.93560149 0.95331266 0.8880056
0.9178541 0.96278502 0.87999381 0.89803517]
mean value: 0.9290354812196953
key: test_accuracy
value: [0.625 0.75 0.75 0.83333333 0.75 0.75
0.70833333 0.91666667 0.82608696 0.7826087 ]
mean value: 0.7692028985507247
key: train_accuracy
value: [0.97663551 0.99065421 0.95794393 0.96728972 0.97663551 0.94392523
0.95794393 0.98130841 0.93953488 0.94883721]
mean value: 0.9640708541621387
key: test_fscore
value: [0.64 0.7 0.78571429 0.83333333 0.76923077 0.72727273
0.75862069 0.91666667 0.83333333 0.7826087 ]
mean value: 0.7746780500858461
key: train_fscore
value: [0.97716895 0.99065421 0.95964126 0.96803653 0.97674419 0.94444444
0.95927602 0.98113208 0.94117647 0.94930876]
mean value: 0.9647582891075718
key: test_precision
value: [0.61538462 0.875 0.6875 0.83333333 0.71428571 0.8
0.64705882 0.91666667 0.76923077 0.81818182]
mean value: 0.7676641740612329
key: train_precision
value: [0.95535714 0.99065421 0.92241379 0.94642857 0.97222222 0.93577982
0.92982456 0.99047619 0.92035398 0.93636364]
mean value: 0.9499874122276843
key: test_recall
value: [0.66666667 0.58333333 0.91666667 0.83333333 0.83333333 0.66666667
0.91666667 0.91666667 0.90909091 0.75 ]
mean value: 0.7992424242424242
key: train_recall
value: [1. 0.99065421 1. 0.99065421 0.98130841 0.95327103
0.99065421 0.97196262 0.96296296 0.96261682]
mean value: 0.9804084458290065
key: test_roc_auc
value: [0.625 0.75 0.75 0.83333333 0.75 0.75
0.70833333 0.91666667 0.82954545 0.78409091]
mean value: 0.7696969696969697
key: train_roc_auc
value: [0.97663551 0.99065421 0.95794393 0.96728972 0.97663551 0.94392523
0.95794393 0.98130841 0.93942541 0.948901 ]
mean value: 0.9640662859120803
key: test_jcc
value: [0.47058824 0.53846154 0.64705882 0.71428571 0.625 0.57142857
0.61111111 0.84615385 0.71428571 0.64285714]
mean value: 0.6381230697407168
key: train_jcc
value: [0.95535714 0.98148148 0.92241379 0.9380531 0.95454545 0.89473684
0.92173913 0.96296296 0.88888889 0.90350877]
mean value: 0.9323687565654382
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02077341 0.01784921 0.01781964 0.01727605 0.0172956 0.01682329
0.01593351 0.01639438 0.01650286 0.01686263]
mean value: 0.017353057861328125
key: score_time
value: [0.01231551 0.00951004 0.00973606 0.00914216 0.0093534 0.00895286
0.00935936 0.00957298 0.00883436 0.00960231]
mean value: 0.009637904167175294
key: test_mcc
value: [0.16666667 0.43033148 0.6761234 0.5 0.41812101 0.53033009
0.50709255 0.25819889 0.31298622 0.56490196]
mean value: 0.4364752260633302
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.58333333 0.70833333 0.83333333 0.75 0.70833333 0.75
0.75 0.625 0.65217391 0.7826087 ]
mean value: 0.7143115942028986
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.58333333 0.66666667 0.84615385 0.75 0.72 0.7
0.76923077 0.66666667 0.66666667 0.8 ]
mean value: 0.7168717948717949
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.58333333 0.77777778 0.78571429 0.75 0.69230769 0.875
0.71428571 0.6 0.61538462 0.76923077]
mean value: 0.7163034188034189
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.58333333 0.58333333 0.91666667 0.75 0.75 0.58333333
0.83333333 0.75 0.72727273 0.83333333]
mean value: 0.7310606060606061
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.58333333 0.70833333 0.83333333 0.75 0.70833333 0.75
0.75 0.625 0.65530303 0.78030303]
mean value: 0.7143939393939395
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.41176471 0.5 0.73333333 0.6 0.5625 0.53846154
0.625 0.5 0.5 0.66666667]
mean value: 0.5637726244343891
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.39
Accuracy on Blind test: 0.71
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09941649 0.10254478 0.10431123 0.10478163 0.10160041 0.10014677
0.09986472 0.09947181 0.10004663 0.10172796]
mean value: 0.10139124393463135
key: score_time
value: [0.01773238 0.0191679 0.01867318 0.01925397 0.01925659 0.01850605
0.01875663 0.01869678 0.01848745 0.01848054]
mean value: 0.01870114803314209
key: test_mcc
value: [0.2508726 0.43033148 0.58536941 0.58536941 0.2508726 0.53033009
0.43033148 0.6761234 0.76764947 0.48075018]
mean value: 0.49880001244620575
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.625 0.70833333 0.79166667 0.79166667 0.625 0.75
0.70833333 0.83333333 0.86956522 0.73913043]
mean value: 0.7442028985507246
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.64 0.66666667 0.8 0.7826087 0.64 0.7
0.74074074 0.84615385 0.88 0.76923077]
mean value: 0.7465400718444197
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.61538462 0.77777778 0.76923077 0.81818182 0.61538462 0.875
0.66666667 0.78571429 0.78571429 0.71428571]
mean value: 0.7423340548340549
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.58333333 0.83333333 0.75 0.66666667 0.58333333
0.83333333 0.91666667 1. 0.83333333]
mean value: 0.7666666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.625 0.70833333 0.79166667 0.79166667 0.625 0.75
0.70833333 0.83333333 0.875 0.73484848]
mean value: 0.7443181818181819
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.47058824 0.5 0.66666667 0.64285714 0.47058824 0.53846154
0.58823529 0.73333333 0.78571429 0.625 ]
mean value: 0.602144473173885
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.61
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01040649 0.01031017 0.01037312 0.00961804 0.01037526 0.01037502
0.01032877 0.01025653 0.0104363 0.00932646]
mean value: 0.01018061637878418
key: score_time
value: [0.00964999 0.00946546 0.00953627 0.00941133 0.00960279 0.00949907
0.00953937 0.00953007 0.00954676 0.0087235 ]
mean value: 0.009450459480285644
key: test_mcc
value: [0.16666667 0.2508726 0.27500955 0. 0.27500955 0.50709255
0.2508726 0.43033148 0.50460839 0.12878788]
mean value: 0.2789251277774354
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.58333333 0.625 0.625 0.5 0.625 0.75
0.625 0.70833333 0.69565217 0.56521739]
mean value: 0.6302536231884058
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.58333333 0.60869565 0.68965517 0.45454545 0.68965517 0.72727273
0.64 0.74074074 0.75862069 0.58333333]
mean value: 0.6475852275882261
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.58333333 0.63636364 0.58823529 0.5 0.58823529 0.8
0.61538462 0.66666667 0.61111111 0.58333333]
mean value: 0.617266328442799
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.58333333 0.58333333 0.83333333 0.41666667 0.83333333 0.66666667
0.66666667 0.83333333 1. 0.58333333]
mean value: 0.7000000000000001
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.58333333 0.625 0.625 0.5 0.625 0.75
0.625 0.70833333 0.70833333 0.56439394]
mean value: 0.631439393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.41176471 0.4375 0.52631579 0.29411765 0.52631579 0.57142857
0.47058824 0.58823529 0.61111111 0.41176471]
mean value: 0.4849141849722345
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.01
Accuracy on Blind test: 0.52
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.35469699 1.33344936 1.3990047 1.37505436 1.3341949 1.35325098
1.33077955 1.33756208 1.384763 1.34119081]
mean value: 1.3543946743011475
key: score_time
value: [0.09228492 0.0974164 0.09831905 0.09852076 0.0970428 0.09776616
0.09751773 0.09144068 0.09766102 0.09835434]
mean value: 0.09663238525390624
key: test_mcc
value: [0.3380617 0.35355339 0.6761234 0.66666667 0.50709255 0.64168895
0.50709255 0.83333333 0.56818182 0.38932432]
mean value: 0.5481118688595472
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.66666667 0.83333333 0.83333333 0.75 0.79166667
0.75 0.91666667 0.7826087 0.69565217]
mean value: 0.7686594202898551
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.69230769 0.6 0.84615385 0.83333333 0.76923077 0.73684211
0.76923077 0.91666667 0.7826087 0.72 ]
mean value: 0.7666373877838408
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.64285714 0.75 0.78571429 0.83333333 0.71428571 1.
0.71428571 0.91666667 0.75 0.69230769]
mean value: 0.779945054945055
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.91666667 0.83333333 0.83333333 0.58333333
0.83333333 0.91666667 0.81818182 0.75 ]
mean value: 0.7734848484848484
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.66666667 0.83333333 0.83333333 0.75 0.79166667
0.75 0.91666667 0.78409091 0.69318182]
mean value: 0.768560606060606
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.52941176 0.42857143 0.73333333 0.71428571 0.625 0.58333333
0.625 0.84615385 0.64285714 0.5625 ]
mean value: 0.6290446563240681
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.66
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.82288122 0.93520522 0.86855769 0.89258552 0.9448607 0.90758777
0.90327692 0.97866297 0.88730645 0.90872741]
mean value: 0.9049651861190796
key: score_time
value: [0.22578979 0.19590878 0.17380428 0.18248248 0.17389655 0.20093942
0.26056457 0.23904443 0.20164275 0.25702143]
mean value: 0.21110944747924804
key: test_mcc
value: [0.41812101 0.35355339 0.60246408 0.83333333 0.43033148 0.64168895
0.41812101 0.83333333 0.65151515 0.48075018]
mean value: 0.5663211901063119
key: train_mcc
value: [0.93494699 0.90670046 0.90670046 0.90717617 0.91592785 0.89754911
0.89754911 0.92539531 0.89800878 0.91632053]
mean value: 0.9106274768697769
key: test_accuracy
value: [0.70833333 0.66666667 0.79166667 0.91666667 0.70833333 0.79166667
0.70833333 0.91666667 0.82608696 0.73913043]
mean value: 0.7773550724637681
key: train_accuracy
value: [0.96728972 0.95327103 0.95327103 0.95327103 0.95794393 0.94859813
0.94859813 0.96261682 0.94883721 0.95813953]
mean value: 0.9551836557270159
key: test_fscore
value: [0.69565217 0.6 0.81481481 0.91666667 0.74074074 0.73684211
0.72 0.91666667 0.81818182 0.76923077]
mean value: 0.7728795755477678
key: train_fscore
value: [0.96774194 0.9537037 0.9537037 0.95412844 0.95813953 0.94930876
0.94930876 0.96296296 0.94977169 0.95813953]
mean value: 0.955690901700711
key: test_precision
value: [0.72727273 0.75 0.73333333 0.91666667 0.66666667 1.
0.69230769 0.91666667 0.81818182 0.71428571]
mean value: 0.7935381285381286
key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [0.95454545 0.94495413 0.94495413 0.93693694 0.9537037 0.93636364
0.93636364 0.95412844 0.93693694 0.9537037 ]
mean value: 0.9452590705801716
key: test_recall
value: [0.66666667 0.5 0.91666667 0.91666667 0.83333333 0.58333333
0.75 0.91666667 0.81818182 0.83333333]
mean value: 0.7734848484848484
key: train_recall
value: [0.98130841 0.96261682 0.96261682 0.97196262 0.96261682 0.96261682
0.96261682 0.97196262 0.96296296 0.96261682]
mean value: 0.9663897542402216
key: test_roc_auc
value: [0.70833333 0.66666667 0.79166667 0.91666667 0.70833333 0.79166667
0.70833333 0.91666667 0.82575758 0.73484848]
mean value: 0.7768939393939394
key: train_roc_auc
value: [0.96728972 0.95327103 0.95327103 0.95327103 0.95794393 0.94859813
0.94859813 0.96261682 0.9487712 0.95816026]
mean value: 0.9551791277258567
key: test_jcc
value: [0.53333333 0.42857143 0.6875 0.84615385 0.58823529 0.58333333
0.5625 0.84615385 0.69230769 0.625 ]
mean value: 0.6393088773971127
key: train_jcc
value: [0.9375 0.91150442 0.91150442 0.9122807 0.91964286 0.90350877
0.90350877 0.92857143 0.90434783 0.91964286]
mean value: 0.9152012064115657
MCC on Blind test: 0.33
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02384639 0.00904751 0.00906181 0.00914288 0.0090673 0.00917602
0.0091517 0.01172256 0.01012325 0.00911999]
mean value: 0.010945940017700195
key: score_time
value: [0.01334524 0.00935245 0.00869036 0.00860286 0.00860476 0.00863767
0.00859475 0.00944138 0.00946784 0.00863409]
mean value: 0.009337139129638673
key: test_mcc
value: [0.41812101 0.16903085 0.58536941 0.58536941 0.2508726 0.58536941
0.5 0.50709255 0.76764947 0.50168817]
mean value: 0.4870562873788939
key: train_mcc
value: [0.59043763 0.63620901 0.62792574 0.61814664 0.65561007 0.62010797
0.63889912 0.63328843 0.60964859 0.64701307]
mean value: 0.6277286276584018
key: test_accuracy
value: [0.70833333 0.58333333 0.79166667 0.79166667 0.625 0.79166667
0.75 0.75 0.86956522 0.73913043]
mean value: 0.740036231884058
key: train_accuracy
value: [0.79439252 0.81775701 0.81308411 0.80841121 0.8271028 0.80841121
0.81775701 0.81308411 0.80465116 0.82325581]
mean value: 0.8127906976744186
key: test_fscore
value: [0.72 0.54545455 0.7826087 0.8 0.60869565 0.7826087
0.75 0.76923077 0.88 0.78571429]
mean value: 0.7424312643877861
key: train_fscore
value: [0.8018018 0.82191781 0.81981982 0.81447964 0.83257919 0.81777778
0.82666667 0.82608696 0.80909091 0.82568807]
mean value: 0.8195908636821799
key: test_precision
value: [0.69230769 0.6 0.81818182 0.76923077 0.63636364 0.81818182
0.75 0.71428571 0.78571429 0.6875 ]
mean value: 0.7271765734265734
key: train_precision
value: [0.77391304 0.80357143 0.79130435 0.78947368 0.80701754 0.77966102
0.78813559 0.77235772 0.79464286 0.81081081]
mean value: 0.7910888049646347
key: test_recall
value: [0.75 0.5 0.75 0.83333333 0.58333333 0.75
0.75 0.83333333 1. 0.91666667]
mean value: 0.7666666666666667
key: train_recall
value: [0.8317757 0.8411215 0.85046729 0.8411215 0.85981308 0.85981308
0.86915888 0.88785047 0.82407407 0.8411215 ]
mean value: 0.8506317064728279
key: test_roc_auc
value: [0.70833333 0.58333333 0.79166667 0.79166667 0.625 0.79166667
0.75 0.75 0.875 0.73106061]
mean value: 0.7397727272727274
key: train_roc_auc
value: [0.79439252 0.81775701 0.81308411 0.80841121 0.8271028 0.80841121
0.81775701 0.81308411 0.8045604 0.82333853]
mean value: 0.8127898926964348
key: test_jcc
value: [0.5625 0.375 0.64285714 0.66666667 0.4375 0.64285714
0.6 0.625 0.78571429 0.64705882]
mean value: 0.598515406162465
key: train_jcc
value: [0.66917293 0.69767442 0.69465649 0.6870229 0.71317829 0.69172932
0.70454545 0.7037037 0.67938931 0.703125 ]
mean value: 0.6944197829356628
MCC on Blind test: 0.22
Accuracy on Blind test: 0.62
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08058214 0.06786609 0.06809545 0.22957754 0.06823206 0.06266332
0.06188226 0.06003165 0.06411195 0.06521845]
mean value: 0.08282608985900879
key: score_time
value: [0.01041341 0.01041722 0.01038694 0.01292777 0.01117945 0.01045275
0.01058483 0.01038098 0.01039505 0.01027608]
mean value: 0.010741448402404786
key: test_mcc
value: [0.50709255 0.45834925 0.75261781 0.83333333 0.66666667 0.70710678
0.50709255 0.75261781 0.83971912 0.38932432]
mean value: 0.6413920196699461
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.70833333 0.875 0.91666667 0.83333333 0.83333333
0.75 0.875 0.91304348 0.69565217]
mean value: 0.8150362318840579
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76923077 0.63157895 0.88 0.91666667 0.83333333 0.8
0.76923077 0.86956522 0.91666667 0.72 ]
mean value: 0.810627236988793
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.85714286 0.84615385 0.91666667 0.83333333 1.
0.71428571 0.90909091 0.84615385 0.69230769]
mean value: 0.8329420579420579
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.91666667 0.91666667 0.83333333 0.66666667
0.83333333 0.83333333 1. 0.75 ]
mean value: 0.8083333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.70833333 0.875 0.91666667 0.83333333 0.83333333
0.75 0.875 0.91666667 0.69318182]
mean value: 0.8151515151515152
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.625 0.46153846 0.78571429 0.84615385 0.71428571 0.66666667
0.625 0.76923077 0.84615385 0.5625 ]
mean value: 0.690224358974359
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.02749777 0.02816367 0.02974796 0.05736232 0.0272913 0.04481721
0.02785373 0.05785823 0.05839539 0.06007934]
mean value: 0.04190669059753418
key: score_time
value: [0.01219177 0.01208019 0.01197457 0.01192737 0.01195431 0.01191282
0.01195431 0.02337527 0.02111554 0.02088499]
mean value: 0.014937114715576173
key: test_mcc
value: [0.35355339 0.66666667 0.2508726 0.41812101 0.16666667 0.50709255
0.50709255 0.5 0.66414149 0.47727273]
mean value: 0.4511479652880026
key: train_mcc
value: [0.94458549 0.95364593 0.95331266 0.95331266 0.97200507 0.94392523
0.94409017 0.97234487 0.95385294 0.96295976]
mean value: 0.9554034791393476
key: test_accuracy
value: [0.66666667 0.83333333 0.625 0.70833333 0.58333333 0.75
0.75 0.75 0.82608696 0.73913043]
mean value: 0.7231884057971014
key: train_accuracy
value: [0.97196262 0.97663551 0.97663551 0.97663551 0.98598131 0.97196262
0.97196262 0.98598131 0.97674419 0.98139535]
mean value: 0.9775896544229515
key: test_fscore
value: [0.71428571 0.83333333 0.64 0.69565217 0.58333333 0.76923077
0.76923077 0.75 0.83333333 0.75 ]
mean value: 0.7338399426660296
key: train_fscore
value: [0.97247706 0.97630332 0.97652582 0.97652582 0.98604651 0.97196262
0.97222222 0.98617512 0.97716895 0.98148148]
mean value: 0.977688892208132
key: test_precision
value: [0.625 0.83333333 0.61538462 0.72727273 0.58333333 0.71428571
0.71428571 0.75 0.76923077 0.75 ]
mean value: 0.7082126207126207
key: train_precision
value: [0.95495495 0.99038462 0.98113208 0.98113208 0.98148148 0.97196262
0.96330275 0.97272727 0.96396396 0.97247706]
mean value: 0.9733518872791876
key: test_recall
value: [0.83333333 0.83333333 0.66666667 0.66666667 0.58333333 0.83333333
0.83333333 0.75 0.90909091 0.75 ]
mean value: 0.7659090909090909
key: train_recall
value: [0.99065421 0.96261682 0.97196262 0.97196262 0.99065421 0.97196262
0.98130841 1. 0.99074074 0.99065421]
mean value: 0.982251644167532
key: test_roc_auc
value: [0.66666667 0.83333333 0.625 0.70833333 0.58333333 0.75
0.75 0.75 0.82954545 0.73863636]
mean value: 0.7234848484848485
key: train_roc_auc
value: [0.97196262 0.97663551 0.97663551 0.97663551 0.98598131 0.97196262
0.97196262 0.98598131 0.97667878 0.98143821]
mean value: 0.9775874004845967
key: test_jcc
value: [0.55555556 0.71428571 0.47058824 0.53333333 0.41176471 0.625
0.625 0.6 0.71428571 0.6 ]
mean value: 0.5849813258636788
key: train_jcc
value: [0.94642857 0.9537037 0.95412844 0.95412844 0.97247706 0.94545455
0.94594595 0.97272727 0.95535714 0.96363636]
mean value: 0.9563987490707675
MCC on Blind test: 0.4
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02090168 0.00918961 0.00905538 0.00885677 0.00880861 0.00887942
0.00888872 0.00904346 0.00889969 0.00889063]
mean value: 0.010141396522521972
key: score_time
value: [0.00995135 0.00894046 0.00866508 0.00853658 0.00849724 0.00857306
0.00858712 0.00854015 0.00854492 0.00852847]
mean value: 0.008736443519592286
key: test_mcc
value: [0.58536941 0.25819889 0.60246408 0.35355339 0.43033148 0.58536941
0.16903085 0.53033009 0.31298622 0.65151515]
mean value: 0.44791489601729073
key: train_mcc
value: [0.47330153 0.50970622 0.46615956 0.49962218 0.50843941 0.49962218
0.51728205 0.45436947 0.44644147 0.48631932]
mean value: 0.4861263378913578
key: test_accuracy
value: [0.79166667 0.625 0.79166667 0.66666667 0.70833333 0.79166667
0.58333333 0.75 0.65217391 0.82608696]
mean value: 0.7186594202898551
key: train_accuracy
value: [0.73364486 0.75233645 0.72897196 0.74766355 0.75233645 0.74766355
0.75700935 0.72429907 0.72093023 0.73953488]
mean value: 0.7404390349923929
key: test_fscore
value: [0.8 0.66666667 0.76190476 0.71428571 0.74074074 0.8
0.61538462 0.78571429 0.66666667 0.83333333]
mean value: 0.7384696784696785
key: train_fscore
value: [0.75324675 0.76855895 0.75213675 0.76315789 0.76651982 0.76315789
0.7699115 0.74458874 0.74137931 0.75862069]
mean value: 0.7581278319624325
key: test_precision
value: [0.76923077 0.6 0.88888889 0.625 0.66666667 0.76923077
0.57142857 0.6875 0.61538462 0.83333333]
mean value: 0.7026663614163614
key: train_precision
value: [0.7016129 0.72131148 0.69291339 0.71900826 0.725 0.71900826
0.73109244 0.69354839 0.69354839 0.704 ]
mean value: 0.7101043504556372
key: test_recall
value: [0.83333333 0.75 0.66666667 0.83333333 0.83333333 0.83333333
0.66666667 0.91666667 0.72727273 0.83333333]
mean value: 0.7893939393939394
key: train_recall
value: [0.81308411 0.82242991 0.82242991 0.81308411 0.81308411 0.81308411
0.81308411 0.80373832 0.7962963 0.82242991]
mean value: 0.8132744894427137
key: test_roc_auc
value: [0.79166667 0.625 0.79166667 0.66666667 0.70833333 0.79166667
0.58333333 0.75 0.65530303 0.82575758]
mean value: 0.7189393939393939
key: train_roc_auc
value: [0.73364486 0.75233645 0.72897196 0.74766355 0.75233645 0.74766355
0.75700935 0.72429907 0.72057805 0.73991866]
mean value: 0.7404421945309796
key: test_jcc
value: [0.66666667 0.5 0.61538462 0.55555556 0.58823529 0.66666667
0.44444444 0.64705882 0.5 0.71428571]
mean value: 0.5898297780650722
key: train_jcc
value: [0.60416667 0.62411348 0.60273973 0.61702128 0.62142857 0.61702128
0.62589928 0.59310345 0.5890411 0.61111111]
mean value: 0.6105645928344353
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01252055 0.01392078 0.01614141 0.01381683 0.0134027 0.01844454
0.01378655 0.01552129 0.01536298 0.01578856]
mean value: 0.014870619773864746
key: score_time
value: [0.00945973 0.01135755 0.01148272 0.01177406 0.01144099 0.01148391
0.01179481 0.01168394 0.01170611 0.01175261]
mean value: 0.01139364242553711
key: test_mcc
value: [0.43033148 0.43033148 0.64168895 0.37796447 0. 0.66666667
0.51298918 0.38490018 0.31252706 0.58002308]
mean value: 0.4337422539152132
key: train_mcc
value: [0.765559 0.66399158 0.81308411 0.56655772 0.45720843 0.74210824
0.50793174 0.66048589 0.67173227 0.66970965]
mean value: 0.6518368627712635
key: test_accuracy
value: [0.70833333 0.70833333 0.79166667 0.625 0.5 0.83333333
0.70833333 0.66666667 0.65217391 0.7826087 ]
mean value: 0.6976449275362319
key: train_accuracy
value: [0.87383178 0.81775701 0.90654206 0.74299065 0.6728972 0.85514019
0.71028037 0.80373832 0.81395349 0.80930233]
mean value: 0.8006433384046946
key: test_fscore
value: [0.74074074 0.74074074 0.82758621 0.4 0.25 0.83333333
0.58823529 0.73333333 0.55555556 0.81481481]
mean value: 0.6484340019532717
key: train_fscore
value: [0.88607595 0.84081633 0.90654206 0.65408805 0.51388889 0.87346939
0.5974026 0.8359375 0.7752809 0.83921569]
mean value: 0.7722717341484435
key: test_precision
value: [0.66666667 0.66666667 0.70588235 1. 0.5 0.83333333
1. 0.61111111 0.71428571 0.73333333]
mean value: 0.7431279178338002
key: train_precision
value: [0.80769231 0.74637681 0.90654206 1. 1. 0.77536232
0.9787234 0.71812081 0.98571429 0.72297297]
mean value: 0.8641504962513562
key: test_recall
value: [0.83333333 0.83333333 1. 0.25 0.16666667 0.83333333
0.41666667 0.91666667 0.45454545 0.91666667]
mean value: 0.6621212121212121
key: train_recall
value: [0.98130841 0.96261682 0.90654206 0.48598131 0.34579439 1.
0.42990654 1. 0.63888889 1. ]
mean value: 0.7751038421599169
key: test_roc_auc
value: [0.70833333 0.70833333 0.79166667 0.625 0.5 0.83333333
0.70833333 0.66666667 0.64393939 0.77651515]
mean value: 0.6962121212121212
key: train_roc_auc
value: [0.87383178 0.81775701 0.90654206 0.74299065 0.6728972 0.85514019
0.71028037 0.80373832 0.81477155 0.81018519]
mean value: 0.8008134302526826
key: test_jcc
value: [0.58823529 0.58823529 0.70588235 0.25 0.14285714 0.71428571
0.41666667 0.57894737 0.38461538 0.6875 ]
mean value: 0.5057225218022432
key: train_jcc
value: [0.79545455 0.72535211 0.82905983 0.48598131 0.34579439 0.77536232
0.42592593 0.71812081 0.63302752 0.72297297]
mean value: 0.6457051734169397
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01544642 0.01680565 0.01688099 0.01658344 0.01468253 0.01595712
0.01408386 0.01546359 0.01462984 0.01517892]
mean value: 0.015571236610412598
key: score_time
value: [0.01171613 0.0115025 0.0115025 0.01150274 0.01146626 0.01148009
0.01147723 0.01166582 0.01146317 0.01148224]
mean value: 0.011525869369506836
key: test_mcc
value: [0.43033148 0.45834925 0.70710678 0.66666667 0.33333333 0.75261781
0.30779351 0.83333333 0.63327851 0.47727273]
mean value: 0.5600083394249354
key: train_mcc
value: [0.63278485 0.83571089 0.84292723 0.86075337 0.80587729 0.73455316
0.69721669 0.78155517 0.60343274 0.83255452]
mean value: 0.7627365913034355
key: test_accuracy
value: [0.70833333 0.70833333 0.83333333 0.83333333 0.66666667 0.875
0.625 0.91666667 0.7826087 0.73913043]
mean value: 0.768840579710145
key: train_accuracy
value: [0.78971963 0.91121495 0.92056075 0.92990654 0.89719626 0.85046729
0.8271028 0.88317757 0.76744186 0.91627907]
mean value: 0.8693066724625081
key: test_fscore
value: [0.66666667 0.63157895 0.85714286 0.83333333 0.66666667 0.88
0.70967742 0.91666667 0.81481481 0.75 ]
mean value: 0.7726547372014265
key: train_fscore
value: [0.73684211 0.9025641 0.92307692 0.93150685 0.8877551 0.8699187
0.85258964 0.87046632 0.81203008 0.91588785]
mean value: 0.8702637669780106
key: test_precision
value: [0.77777778 0.85714286 0.75 0.83333333 0.66666667 0.84615385
0.57894737 0.91666667 0.6875 0.75 ]
mean value: 0.76641885161622
key: train_precision
value: [0.984375 1. 0.89473684 0.91071429 0.97752809 0.76978417
0.74305556 0.97674419 0.6835443 0.91588785]
mean value: 0.8856370286235885
key: test_recall
value: [0.58333333 0.5 1. 0.83333333 0.66666667 0.91666667
0.91666667 0.91666667 1. 0.75 ]
mean value: 0.8083333333333333
key: train_recall
value: [0.58878505 0.82242991 0.95327103 0.95327103 0.81308411 1.
1. 0.78504673 1. 0.91588785]
mean value: 0.8831775700934579
key: test_roc_auc
value: [0.70833333 0.70833333 0.83333333 0.83333333 0.66666667 0.875
0.625 0.91666667 0.79166667 0.73863636]
mean value: 0.7696969696969697
key: train_roc_auc
value: [0.78971963 0.91121495 0.92056075 0.92990654 0.89719626 0.85046729
0.8271028 0.88317757 0.76635514 0.91627726]
mean value: 0.8691978193146417
key: test_jcc
value: [0.5 0.46153846 0.75 0.71428571 0.5 0.78571429
0.55 0.84615385 0.6875 0.6 ]
mean value: 0.6395192307692308
key: train_jcc
value: [0.58333333 0.82242991 0.85714286 0.87179487 0.79816514 0.76978417
0.74305556 0.7706422 0.6835443 0.84482759]
mean value: 0.774471992648445
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.13520432 0.11692595 0.11517644 0.1157937 0.11509657 0.11439133
0.11565018 0.11544991 0.11597919 0.11625528]
mean value: 0.11759228706359863
key: score_time
value: [0.01483297 0.01478171 0.01470661 0.01475096 0.0147438 0.01558471
0.01485181 0.01469231 0.01466393 0.0147202 ]
mean value: 0.014832901954650878
key: test_mcc
value: [0.58536941 0.35355339 0.6761234 0.58536941 0.50709255 0.60246408
0.66666667 0.58536941 0.74047959 0.38932432]
mean value: 0.5691812221593513
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.66666667 0.83333333 0.79166667 0.75 0.79166667
0.83333333 0.79166667 0.86956522 0.69565217]
mean value: 0.7815217391304348
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.6 0.84615385 0.8 0.76923077 0.76190476
0.83333333 0.7826087 0.85714286 0.72 ]
mean value: 0.7752982959069915
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 0.75 0.78571429 0.76923077 0.71428571 0.88888889
0.83333333 0.81818182 0.9 0.69230769]
mean value: 0.797012432012432
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.91666667 0.83333333 0.83333333 0.66666667
0.83333333 0.75 0.81818182 0.75 ]
mean value: 0.7651515151515151
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.66666667 0.83333333 0.79166667 0.75 0.79166667
0.83333333 0.79166667 0.86742424 0.69318182]
mean value: 0.7810606060606061
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.42857143 0.73333333 0.66666667 0.625 0.61538462
0.71428571 0.64285714 0.75 0.5625 ]
mean value: 0.6381456043956044
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04348755 0.04252839 0.04224372 0.06066871 0.0431087 0.06443048
0.05952501 0.05365348 0.04849029 0.05835891]
mean value: 0.05164952278137207
key: score_time
value: [0.02122927 0.01823187 0.01750135 0.03620076 0.02824354 0.03714728
0.02844954 0.02909589 0.03381586 0.02994609]
mean value: 0.02798614501953125
key: test_mcc
value: [0.58536941 0.35355339 0.58536941 0.53033009 0.0836242 0.35355339
0.75261781 0.3380617 0.65151515 0.47727273]
mean value: 0.47112672717727616
key: train_mcc
value: [0.98147988 0.99069747 0.95431352 0.97200507 0.99069747 1.
0.98147988 0.96329016 0.98156643 0.98156326]
mean value: 0.9797093130637349
key: test_accuracy
value: [0.79166667 0.66666667 0.79166667 0.75 0.54166667 0.66666667
0.875 0.66666667 0.82608696 0.73913043]
mean value: 0.7315217391304347
key: train_accuracy
value: [0.99065421 0.9953271 0.97663551 0.98598131 0.9953271 1.
0.99065421 0.98130841 0.99069767 0.99069767]
mean value: 0.98972831993045
key: test_fscore
value: [0.7826087 0.6 0.8 0.7 0.52173913 0.6
0.86956522 0.63636364 0.81818182 0.75 ]
mean value: 0.7078458498023715
key: train_fscore
value: [0.99056604 0.99530516 0.97607656 0.98591549 0.99530516 1.
0.99056604 0.98095238 0.99065421 0.99056604]
mean value: 0.9895907076387572
key: test_precision
value: [0.81818182 0.75 0.76923077 0.875 0.54545455 0.75
0.90909091 0.7 0.81818182 0.75 ]
mean value: 0.768513986013986
key: train_precision
value: [1. 1. 1. 0.99056604 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990566037735849
key: test_recall
value: [0.75 0.5 0.83333333 0.58333333 0.5 0.5
0.83333333 0.58333333 0.81818182 0.75 ]
mean value: 0.6651515151515152
key: train_recall
value: [0.98130841 0.99065421 0.95327103 0.98130841 0.99065421 1.
0.98130841 0.96261682 0.98148148 0.98130841]
mean value: 0.9803911388023537
key: test_roc_auc
value: [0.79166667 0.66666667 0.79166667 0.75 0.54166667 0.66666667
0.875 0.66666667 0.82575758 0.73863636]
mean value: 0.731439393939394
key: train_roc_auc
value: [0.99065421 0.9953271 0.97663551 0.98598131 0.9953271 1.
0.99065421 0.98130841 0.99074074 0.99065421]
mean value: 0.9897282796815506
key: test_jcc
value: [0.64285714 0.42857143 0.66666667 0.53846154 0.35294118 0.42857143
0.76923077 0.46666667 0.69230769 0.6 ]
mean value: 0.5586274509803921
key: train_jcc
value: [0.98130841 0.99065421 0.95327103 0.97222222 0.99065421 1.
0.98130841 0.96261682 0.98148148 0.98130841]
mean value: 0.9794825199030807
MCC on Blind test: 0.31
Accuracy on Blind test: 0.65
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.02282429 0.02679276 0.02753735 0.02729583 0.02739811 0.06211233
0.07006454 0.06247091 0.065027 0.07080889]
mean value: 0.046233201026916505
key: score_time
value: [0.01267719 0.01270986 0.02094722 0.01265335 0.01270294 0.02475357
0.0230341 0.02312326 0.02105856 0.02447557]
mean value: 0.018813562393188477
key: test_mcc
value: [0.2508726 0.3380617 0.3380617 0.58536941 0.16903085 0.58536941
0.35355339 0.43033148 0.5164589 0.31252706]
mean value: 0.3879636502859699
key: train_mcc
value: [0.99069747 0.99069747 0.99069747 0.99069747 0.99069747 0.99069747
0.98147988 0.99069747 0.99073994 0.99074074]
mean value: 0.98978428681763
key: test_accuracy
value: [0.625 0.66666667 0.66666667 0.79166667 0.58333333 0.79166667
0.66666667 0.70833333 0.73913043 0.65217391]
mean value: 0.6891304347826087
key: train_accuracy
value: [0.9953271 0.9953271 0.9953271 0.9953271 0.9953271 0.9953271
0.99065421 0.9953271 0.99534884 0.99534884]
mean value: 0.994864159965225
key: test_fscore
value: [0.64 0.63636364 0.69230769 0.8 0.61538462 0.7826087
0.71428571 0.74074074 0.76923077 0.71428571]
mean value: 0.7105207578251057
key: train_fscore
value: [0.99534884 0.99534884 0.99534884 0.99534884 0.99534884 0.99534884
0.99074074 0.99534884 0.99539171 0.99534884]
mean value: 0.9948923143484284
key: test_precision
value: [0.61538462 0.7 0.64285714 0.76923077 0.57142857 0.81818182
0.625 0.66666667 0.66666667 0.625 ]
mean value: 0.670041625041625
key: train_precision
value: [0.99074074 0.99074074 0.99074074 0.99074074 0.99074074 0.99074074
0.98165138 0.99074074 0.99082569 0.99074074]
mean value: 0.9898402990146109
key: test_recall
value: [0.66666667 0.58333333 0.75 0.83333333 0.66666667 0.75
0.83333333 0.83333333 0.90909091 0.83333333]
mean value: 0.7659090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.625 0.66666667 0.66666667 0.79166667 0.58333333 0.79166667
0.66666667 0.70833333 0.74621212 0.64393939]
mean value: 0.6890151515151515
key: train_roc_auc
value: [0.9953271 0.9953271 0.9953271 0.9953271 0.9953271 0.9953271
0.99065421 0.9953271 0.9953271 0.99537037]
mean value: 0.9948641398407753
key: test_jcc
value: [0.47058824 0.46666667 0.52941176 0.66666667 0.44444444 0.64285714
0.55555556 0.58823529 0.625 0.55555556]
mean value: 0.5544981325863679
key: train_jcc
value: [0.99074074 0.99074074 0.99074074 0.99074074 0.99074074 0.99074074
0.98165138 0.99074074 0.99082569 0.99074074]
mean value: 0.9898402990146109
MCC on Blind test: 0.15
Accuracy on Blind test: 0.58
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.39302421 0.38781261 0.38132334 0.38107753 0.38223696 0.38157129
0.38028026 0.38978195 0.38478231 0.38798022]
mean value: 0.38498706817626954
key: score_time
value: [0.00929594 0.00920892 0.00926375 0.00939751 0.00947452 0.00936222
0.00934148 0.00942659 0.00945687 0.00921726]
mean value: 0.00934450626373291
key: test_mcc
value: [0.66666667 0.3380617 0.60246408 0.75261781 0.83333333 0.60246408
0.58536941 0.75261781 0.66414149 0.38932432]
mean value: 0.6187060687615544
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.66666667 0.79166667 0.875 0.91666667 0.79166667
0.79166667 0.875 0.82608696 0.69565217]
mean value: 0.806340579710145
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.63636364 0.81481481 0.88 0.91666667 0.76190476
0.8 0.86956522 0.83333333 0.72 ]
mean value: 0.806598176380785
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.7 0.73333333 0.84615385 0.91666667 0.88888889
0.76923077 0.90909091 0.76923077 0.69230769]
mean value: 0.8058236208236208
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.58333333 0.91666667 0.91666667 0.91666667 0.66666667
0.83333333 0.83333333 0.90909091 0.75 ]
mean value: 0.8159090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.66666667 0.79166667 0.875 0.91666667 0.79166667
0.79166667 0.875 0.82954545 0.69318182]
mean value: 0.8064393939393939
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.46666667 0.6875 0.78571429 0.84615385 0.61538462
0.66666667 0.76923077 0.71428571 0.5625 ]
mean value: 0.6828388278388279
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02136946 0.02109194 0.02217722 0.02191544 0.02162218 0.02250242
0.0231998 0.03056288 0.02152514 0.02213955]
mean value: 0.022810602188110353
key: score_time
value: [0.01971126 0.01300335 0.01919699 0.01460743 0.01475883 0.01430082
0.01233721 0.01231742 0.01723862 0.01476789]
mean value: 0.015223979949951172
key: test_mcc
value: [0.35355339 0.5 0.41812101 0.25819889 0.3380617 0.66666667
0.43033148 0.3380617 0.3030303 0.04545455]
mean value: 0.3651479687190244
key: train_mcc
value: [1. 0.94541277 0.98147988 0.95431352 0.90197523 1.
0.97234487 1. 0.91088773 0.98156643]
mean value: 0.9647980425405398
key: test_accuracy
value: [0.66666667 0.75 0.70833333 0.625 0.66666667 0.83333333
0.70833333 0.66666667 0.65217391 0.52173913]
mean value: 0.6798913043478261
key: train_accuracy
value: [1. 0.97196262 0.99065421 0.97663551 0.94859813 1.
0.98598131 1. 0.95348837 0.99069767]
mean value: 0.9818017822212562
key: test_fscore
value: [0.71428571 0.75 0.69565217 0.66666667 0.63636364 0.83333333
0.74074074 0.69230769 0.63636364 0.52173913]
mean value: 0.6887452724409246
key: train_fscore
value: [1. 0.97272727 0.99074074 0.97716895 0.95111111 1.
0.98617512 1. 0.95575221 0.99074074]
mean value: 0.9824416142688309
key: test_precision
value: [0.625 0.75 0.72727273 0.6 0.7 0.83333333
0.66666667 0.64285714 0.63636364 0.54545455]
mean value: 0.6726948051948052
key: train_precision
value: [1. 0.94690265 0.98165138 0.95535714 0.90677966 1.
0.97272727 1. 0.91525424 0.98165138]
mean value: 0.9660323721050335
key: test_recall
value: [0.83333333 0.75 0.66666667 0.75 0.58333333 0.83333333
0.83333333 0.75 0.63636364 0.5 ]
mean value: 0.7136363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.75 0.70833333 0.625 0.66666667 0.83333333
0.70833333 0.66666667 0.65151515 0.52272727]
mean value: 0.6799242424242424
key: train_roc_auc
value: [1. 0.97196262 0.99065421 0.97663551 0.94859813 1.
0.98598131 1. 0.95327103 0.99074074]
mean value: 0.9817843544479058
key: test_jcc
value: [0.55555556 0.6 0.53333333 0.5 0.46666667 0.71428571
0.58823529 0.52941176 0.46666667 0.35294118]
mean value: 0.5307096171802054
key: train_jcc
value: [1. 0.94690265 0.98165138 0.95535714 0.90677966 1.
0.97272727 1. 0.91525424 0.98165138]
mean value: 0.9660323721050335
MCC on Blind test: 0.1
Accuracy on Blind test: 0.56
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02252746 0.03434563 0.03585696 0.03267336 0.03483081 0.03434944
0.03433037 0.02833104 0.03430033 0.03429699]
mean value: 0.03258423805236817
key: score_time
value: [0.01528716 0.02240324 0.02809668 0.02184415 0.0218873 0.02381444
0.02386975 0.02300072 0.020684 0.02241898]
mean value: 0.022330641746520996
key: test_mcc
value: [0.33333333 0.6761234 0.41812101 0.58536941 0.35355339 0.50709255
0.50709255 0.66666667 0.74242424 0.39393939]
mean value: 0.5183715948422453
key: train_mcc
value: [0.87052859 0.89723545 0.87885017 0.8884715 0.88785047 0.88785047
0.87052859 0.88039066 0.84360068 0.86233346]
mean value: 0.876764004971511
key: test_accuracy
value: [0.66666667 0.83333333 0.70833333 0.79166667 0.66666667 0.75
0.75 0.83333333 0.86956522 0.69565217]
mean value: 0.7565217391304347
key: train_accuracy
value: [0.93457944 0.94859813 0.93925234 0.94392523 0.94392523 0.94392523
0.93457944 0.93925234 0.92093023 0.93023256]
mean value: 0.9379200173875245
key: test_fscore
value: [0.66666667 0.81818182 0.72 0.8 0.71428571 0.72727273
0.76923077 0.83333333 0.86956522 0.69565217]
mean value: 0.7614188420275376
key: train_fscore
value: [0.93636364 0.94883721 0.94009217 0.94495413 0.94392523 0.94392523
0.93636364 0.94117647 0.92376682 0.9321267 ]
mean value: 0.9391531227222615
key: test_precision
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:175: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:178: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.66666667 0.9 0.69230769 0.76923077 0.625 0.8
0.71428571 0.83333333 0.83333333 0.72727273]
mean value: 0.7561430236430237
key: train_precision
value: [0.91150442 0.94444444 0.92727273 0.92792793 0.94392523 0.94392523
0.91150442 0.9122807 0.89565217 0.90350877]
mean value: 0.9221946064089596
key: test_recall
value: [0.66666667 0.75 0.75 0.83333333 0.83333333 0.66666667
0.83333333 0.83333333 0.90909091 0.66666667]
mean value: 0.7742424242424243
key: train_recall
value: [0.96261682 0.95327103 0.95327103 0.96261682 0.94392523 0.94392523
0.96261682 0.97196262 0.9537037 0.96261682]
mean value: 0.9570526133610245
key: test_roc_auc
value: [0.66666667 0.83333333 0.70833333 0.79166667 0.66666667 0.75
0.75 0.83333333 0.87121212 0.6969697 ]
mean value: 0.7568181818181818
key: train_roc_auc
value: [0.93457944 0.94859813 0.93925234 0.94392523 0.94392523 0.94392523
0.93457944 0.93925234 0.92077709 0.93038249]
mean value: 0.9379196953963309
key: test_jcc
value: [0.5 0.69230769 0.5625 0.66666667 0.55555556 0.57142857
0.625 0.71428571 0.76923077 0.53333333]
mean value: 0.6190308302808303
key: train_jcc
value: [0.88034188 0.90265487 0.88695652 0.89565217 0.89380531 0.89380531
0.88034188 0.88888889 0.85833333 0.87288136]
mean value: 0.8853661521216024
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.22179174 0.28168225 0.26826978 0.22097516 0.23485756 0.22735858
0.23563886 0.23156381 0.22941089 0.23866081]
mean value: 0.2390209436416626
key: score_time
value: [0.0200696 0.02238464 0.02189493 0.02158761 0.02371407 0.02026796
0.02035952 0.020437 0.02219439 0.02334285]
mean value: 0.021625256538391112
key: test_mcc
value: [0.33333333 0.66666667 0.6761234 0.75261781 0.60246408 0.43033148
0.53033009 0.60246408 0.74242424 0.58002308]
mean value: 0.5916778251293104
key: train_mcc
value: [0.77207467 0.76181538 0.76033717 0.75032247 0.77692337 0.76908054
0.77043718 0.74300512 0.77789466 0.78889274]
mean value: 0.7670783288690242
key: test_accuracy
value: [0.66666667 0.83333333 0.83333333 0.875 0.79166667 0.70833333
0.75 0.79166667 0.86956522 0.7826087 ]
mean value: 0.7902173913043479
key: train_accuracy
value: [0.88317757 0.87850467 0.87850467 0.87383178 0.88785047 0.88317757
0.88317757 0.86915888 0.88837209 0.89302326]
mean value: 0.8818778526407303
key: test_fscore
value: [0.66666667 0.83333333 0.84615385 0.86956522 0.81481481 0.66666667
0.78571429 0.81481481 0.86956522 0.81481481]
mean value: 0.7982109677761852
key: train_fscore
value: [0.88986784 0.88495575 0.88392857 0.87892377 0.89090909 0.88789238
0.88888889 0.87610619 0.89189189 0.89686099]
mean value: 0.8870225361475632
key: test_precision
value: [0.66666667 0.83333333 0.78571429 0.90909091 0.73333333 0.77777778
0.6875 0.73333333 0.83333333 0.73333333]
mean value: 0.7693416305916305
key: train_precision
value: [0.84166667 0.84033613 0.84615385 0.84482759 0.86725664 0.85344828
0.84745763 0.83193277 0.86842105 0.86206897]
mean value: 0.8503569564888109
key: test_recall
value: [0.66666667 0.83333333 0.91666667 0.83333333 0.91666667 0.58333333
0.91666667 0.91666667 0.90909091 0.91666667]
mean value: 0.8409090909090909
key: train_recall
value: [0.94392523 0.93457944 0.92523364 0.91588785 0.91588785 0.92523364
0.93457944 0.92523364 0.91666667 0.93457944]
mean value: 0.9271806853582555
key: test_roc_auc
value: [0.66666667 0.83333333 0.83333333 0.875 0.79166667 0.70833333
0.75 0.79166667 0.87121212 0.77651515]
mean value: 0.7897727272727273
key: train_roc_auc
value: [0.88317757 0.87850467 0.87850467 0.87383178 0.88785047 0.88317757
0.88317757 0.86915888 0.88823988 0.89321565]
mean value: 0.8818838698511595
key: test_jcc
value: [0.5 0.71428571 0.73333333 0.76923077 0.6875 0.5
0.64705882 0.6875 0.76923077 0.6875 ]
mean value: 0.6695639409609998
key: train_jcc
value: [0.8015873 0.79365079 0.792 0.784 0.80327869 0.7983871
0.8 0.77952756 0.80487805 0.81300813]
mean value: 0.7970317618453786
MCC on Blind test: 0.38
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.030931 0.0320158 0.02927685 0.03117085 0.03132129 0.02582955
0.0295136 0.03285122 0.03214002 0.02977347]
mean value: 0.0304823637008667
key: score_time
value: [0.01595521 0.01361537 0.01179814 0.01176691 0.01180816 0.0117135
0.01171279 0.01357198 0.01349711 0.01170993]
mean value: 0.012714910507202148
key: test_mcc
value: [0.58536941 0.60246408 0.70710678 0.53033009 0.70710678 0.2508726
0.77459667 0.75261781 0.75261781 0.5 ]
mean value: 0.6163082021601053
key: train_mcc
value: [0.74278135 0.81607516 0.78869542 0.81537425 0.77992042 0.80725296
0.82495863 0.81607516 0.81537425 0.79848995]
mean value: 0.8004997563631071
key: test_accuracy
value: [0.79166667 0.79166667 0.83333333 0.75 0.83333333 0.625
0.875 0.875 0.875 0.75 ]
mean value: 0.8
key: train_accuracy
value: [0.87037037 0.90740741 0.89351852 0.90740741 0.88888889 0.90277778
0.91203704 0.90740741 0.90740741 0.89814815]
mean value: 0.899537037037037
key: test_fscore
value: [0.7826087 0.76190476 0.85714286 0.78571429 0.85714286 0.64
0.85714286 0.86956522 0.88 0.75 ]
mean value: 0.8041221532091097
key: train_fscore
value: [0.875 0.90990991 0.89686099 0.90909091 0.89285714 0.9058296
0.91402715 0.90990991 0.90909091 0.90178571]
mean value: 0.9024362227425403
key: test_precision
value: [0.81818182 0.88888889 0.75 0.6875 0.75 0.61538462
1. 0.90909091 0.84615385 0.75 ]
mean value: 0.8015200077700078
key: train_precision
value: [0.84482759 0.88596491 0.86956522 0.89285714 0.86206897 0.87826087
0.89380531 0.88596491 0.89285714 0.87068966]
mean value: 0.8776861713863275
key: test_recall
value: [0.75 0.66666667 1. 0.91666667 1. 0.66666667
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.825
key: train_recall
value: [0.90740741 0.93518519 0.92592593 0.92592593 0.92592593 0.93518519
0.93518519 0.93518519 0.92592593 0.93518519]
mean value: 0.9287037037037037
key: test_roc_auc
value: [0.79166667 0.79166667 0.83333333 0.75 0.83333333 0.625
0.875 0.875 0.875 0.75 ]
mean value: 0.7999999999999999
key: train_roc_auc
value: [0.87037037 0.90740741 0.89351852 0.90740741 0.88888889 0.90277778
0.91203704 0.90740741 0.90740741 0.89814815]
mean value: 0.899537037037037
key: test_jcc
value: [0.64285714 0.61538462 0.75 0.64705882 0.75 0.47058824
0.75 0.76923077 0.78571429 0.6 ]
mean value: 0.6780833872010342
key: train_jcc
value: [0.77777778 0.83471074 0.81300813 0.83333333 0.80645161 0.82786885
0.84166667 0.83471074 0.83333333 0.82113821]
mean value: 0.8223999405540073
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.73983574 0.90101838 0.73272038 0.73073149 0.88028049 0.73427486
0.73257947 0.83858824 0.72452283 0.72177792]
mean value: 0.7736329793930053
key: score_time
value: [0.01196504 0.01196766 0.01193738 0.012146 0.01205468 0.01202273
0.01198864 0.01195478 0.01201153 0.01198506]
mean value: 0.012003350257873534
key: test_mcc
value: [0.66666667 0.75261781 0.70710678 0.53033009 0.70710678 0.25819889
0.6761234 0.66666667 0.75261781 0.6761234 ]
mean value: 0.639355829692189
key: train_mcc
value: [0.74535599 0.74393663 0.76253505 0.78788184 0.72421182 0.73403465
0.85243671 0.71818485 0.79684302 0.73403465]
mean value: 0.7599455220153282
key: test_accuracy
value: [0.83333333 0.875 0.83333333 0.75 0.83333333 0.625
0.83333333 0.83333333 0.875 0.83333333]
mean value: 0.8125
key: train_accuracy
value: [0.87037037 0.87037037 0.87962963 0.89351852 0.86111111 0.86574074
0.92592593 0.85648148 0.89814815 0.86574074]
mean value: 0.8787037037037037
key: test_fscore
value: [0.83333333 0.86956522 0.85714286 0.78571429 0.85714286 0.66666667
0.81818182 0.83333333 0.88 0.84615385]
mean value: 0.8247234215060302
key: train_fscore
value: [0.87719298 0.87610619 0.88495575 0.8959276 0.86607143 0.87111111
0.92727273 0.86462882 0.9 0.87111111]
mean value: 0.8834377730195826
key: test_precision
value: [0.83333333 0.90909091 0.75 0.6875 0.75 0.6
0.9 0.83333333 0.84615385 0.78571429]
mean value: 0.7895125707625708
key: train_precision
value: [0.83333333 0.83898305 0.84745763 0.87610619 0.8362069 0.83760684
0.91071429 0.81818182 0.88392857 0.83760684]
mean value: 0.8520125453079775
key: test_recall
value: [0.83333333 0.83333333 1. 0.91666667 1. 0.75
0.75 0.83333333 0.91666667 0.91666667]
mean value: 0.875
key: train_recall
value: [0.92592593 0.91666667 0.92592593 0.91666667 0.89814815 0.90740741
0.94444444 0.91666667 0.91666667 0.90740741]
mean value: 0.9175925925925926
key: test_roc_auc
value: [0.83333333 0.875 0.83333333 0.75 0.83333333 0.625
0.83333333 0.83333333 0.875 0.83333333]
mean value: 0.8125
key: train_roc_auc
value: [0.87037037 0.87037037 0.87962963 0.89351852 0.86111111 0.86574074
0.92592593 0.85648148 0.89814815 0.86574074]
mean value: 0.8787037037037037
key: test_jcc
value: [0.71428571 0.76923077 0.75 0.64705882 0.75 0.5
0.69230769 0.71428571 0.78571429 0.73333333]
mean value: 0.7056216332686921
key: train_jcc
value: [0.78125 0.77952756 0.79365079 0.81147541 0.76377953 0.77165354
0.86440678 0.76153846 0.81818182 0.77165354]
mean value: 0.7917117436096502
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01284027 0.01234365 0.00941205 0.00910616 0.00889063 0.009027
0.00890779 0.00902915 0.00880742 0.00880718]
mean value: 0.009717130661010742
key: score_time
value: [0.01170111 0.00989246 0.00911975 0.00862837 0.00865746 0.00871444
0.00870013 0.00863981 0.00864148 0.00862479]
mean value: 0.009131979942321778
key: test_mcc
value: [0.53033009 0.43033148 0.30779351 0.51298918 0.51298918 0.3380617
0.43033148 0.51298918 0.53033009 0.60246408]
mean value: 0.47086099493250855
key: train_mcc
value: [0.49840764 0.50566876 0.53033009 0.53831098 0.5267854 0.51043405
0.53045108 0.49458912 0.49840764 0.48770901]
mean value: 0.5121093761721703
key: test_accuracy
value: [0.75 0.70833333 0.625 0.70833333 0.70833333 0.66666667
0.70833333 0.70833333 0.75 0.79166667]
mean value: 0.7125
key: train_accuracy
value: [0.73148148 0.71759259 0.75 0.75462963 0.75 0.73611111
0.74537037 0.73148148 0.73148148 0.73611111]
mean value: 0.7384259259259259
key: test_fscore
value: [0.78571429 0.74074074 0.70967742 0.77419355 0.77419355 0.69230769
0.74074074 0.77419355 0.78571429 0.81481481]
mean value: 0.7592290624548689
key: train_fscore
value: [0.7734375 0.77490775 0.78571429 0.78884462 0.784 0.77821012
0.78599222 0.77165354 0.7734375 0.7654321 ]
mean value: 0.778162963300859
key: test_precision
value: [0.6875 0.66666667 0.57894737 0.63157895 0.63157895 0.64285714
0.66666667 0.63157895 0.6875 0.73333333]
mean value: 0.6558208020050125
key: train_precision
value: [0.66891892 0.64417178 0.6875 0.69230769 0.69014085 0.67114094
0.67785235 0.67123288 0.66891892 0.68888889]
mean value: 0.6761073208548879
key: test_recall
value: [0.91666667 0.83333333 0.91666667 1. 1. 0.75
0.83333333 1. 0.91666667 0.91666667]
mean value: 0.9083333333333333
key: train_recall
value: [0.91666667 0.97222222 0.91666667 0.91666667 0.90740741 0.92592593
0.93518519 0.90740741 0.91666667 0.86111111]
mean value: 0.9175925925925926
key: test_roc_auc
value: [0.75 0.70833333 0.625 0.70833333 0.70833333 0.66666667
0.70833333 0.70833333 0.75 0.79166667]
mean value: 0.7124999999999999
key: train_roc_auc
value: [0.73148148 0.71759259 0.75 0.75462963 0.75 0.73611111
0.74537037 0.73148148 0.73148148 0.73611111]
mean value: 0.7384259259259259
key: test_jcc
value: [0.64705882 0.58823529 0.55 0.63157895 0.63157895 0.52941176
0.58823529 0.63157895 0.64705882 0.6875 ]
mean value: 0.6132236842105263
key: train_jcc
value: [0.63057325 0.63253012 0.64705882 0.65131579 0.64473684 0.63694268
0.6474359 0.62820513 0.63057325 0.62 ]
mean value: 0.6369371773205834
MCC on Blind test: 0.19
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00939465 0.00916624 0.0092895 0.00931859 0.00914979 0.00933838
0.0092392 0.0092268 0.00913119 0.00918865]
mean value: 0.009244298934936524
key: score_time
value: [0.00876069 0.00885034 0.00866103 0.00876975 0.0086937 0.00873065
0.0087502 0.00870895 0.00870633 0.00871468]
mean value: 0.008734631538391113
key: test_mcc
value: [0.58536941 0.43033148 0.58536941 0.57735027 0.2508726 0.3380617
0.58536941 0.5 0.60246408 0.6761234 ]
mean value: 0.5131311757869508
key: train_mcc
value: [0.56542109 0.63957467 0.64111887 0.62103628 0.65366344 0.64023511
0.58638277 0.63355259 0.60187765 0.62361342]
mean value: 0.6206475899027868
key: test_accuracy
value: [0.79166667 0.70833333 0.79166667 0.75 0.625 0.66666667
0.79166667 0.75 0.79166667 0.83333333]
mean value: 0.75
key: train_accuracy
value: [0.78240741 0.81944444 0.81944444 0.81018519 0.82407407 0.81944444
0.79166667 0.81481481 0.80092593 0.81018519]
mean value: 0.8092592592592592
key: test_fscore
value: [0.7826087 0.66666667 0.8 0.8 0.64 0.69230769
0.7826087 0.75 0.81481481 0.84615385]
mean value: 0.7575160411247368
key: train_fscore
value: [0.78733032 0.82352941 0.82666667 0.81447964 0.83478261 0.82511211
0.80176211 0.8245614 0.8 0.81938326]
mean value: 0.8157607527459586
key: test_precision
value: [0.81818182 0.77777778 0.76923077 0.66666667 0.61538462 0.64285714
0.81818182 0.75 0.73333333 0.78571429]
mean value: 0.7377328227328227
key: train_precision
value: [0.7699115 0.80530973 0.79487179 0.79646018 0.78688525 0.8
0.76470588 0.78333333 0.80373832 0.78151261]
mean value: 0.7886728595187938
key: test_recall
value: [0.75 0.58333333 0.83333333 1. 0.66666667 0.75
0.75 0.75 0.91666667 0.91666667]
mean value: 0.7916666666666666
key: train_recall
value: [0.80555556 0.84259259 0.86111111 0.83333333 0.88888889 0.85185185
0.84259259 0.87037037 0.7962963 0.86111111]
mean value: 0.8453703703703703
key: test_roc_auc
value: [0.79166667 0.70833333 0.79166667 0.75 0.625 0.66666667
0.79166667 0.75 0.79166667 0.83333333]
mean value: 0.75
key: train_roc_auc
value: [0.78240741 0.81944444 0.81944444 0.81018519 0.82407407 0.81944444
0.79166667 0.81481481 0.80092593 0.81018519]
mean value: 0.8092592592592592
key: test_jcc
value: [0.64285714 0.5 0.66666667 0.66666667 0.47058824 0.52941176
0.64285714 0.6 0.6875 0.73333333]
mean value: 0.6139880952380953
key: train_jcc
value: [0.64925373 0.7 0.70454545 0.6870229 0.71641791 0.70229008
0.66911765 0.70149254 0.66666667 0.69402985]
mean value: 0.6890836775220928
MCC on Blind test: 0.23
Accuracy on Blind test: 0.63
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01033831 0.00852084 0.00854516 0.00849104 0.00850368 0.00855517
0.00873184 0.00868368 0.00860906 0.00962782]
mean value: 0.0088606595993042
key: score_time
value: [0.01552606 0.01441646 0.01456785 0.01397634 0.01450419 0.01429963
0.01512551 0.01440215 0.01423144 0.00980425]
mean value: 0.01408538818359375
key: test_mcc
value: [0.75261781 0.41812101 0.33333333 0.09166985 0.50709255 0.27500955
0.16903085 0.5 0.66666667 0.2508726 ]
mean value: 0.3964414219606152
key: train_mcc
value: [0.58760578 0.61491869 0.61138741 0.64825931 0.59763515 0.62361342
0.6094494 0.60395256 0.63355259 0.61631125]
mean value: 0.614668557893251
key: test_accuracy
value: [0.875 0.70833333 0.66666667 0.54166667 0.75 0.625
0.58333333 0.75 0.83333333 0.625 ]
mean value: 0.6958333333333333
key: train_accuracy
value: [0.79166667 0.80555556 0.80092593 0.82407407 0.7962963 0.81018519
0.80092593 0.80092593 0.81481481 0.80555556]
mean value: 0.8050925925925926
key: test_fscore
value: [0.86956522 0.72 0.66666667 0.62068966 0.76923077 0.68965517
0.61538462 0.75 0.83333333 0.64 ]
mean value: 0.7174525429592896
key: train_fscore
value: [0.80349345 0.81578947 0.81702128 0.82568807 0.80869565 0.81938326
0.81545064 0.80888889 0.8245614 0.8173913 ]
mean value: 0.8156363426064228
key: test_precision
value: [0.90909091 0.69230769 0.66666667 0.52941176 0.71428571 0.58823529
0.57142857 0.75 0.83333333 0.61538462]
mean value: 0.6870144561321032
key: train_precision
value: [0.76033058 0.775 0.75590551 0.81818182 0.76229508 0.78151261
0.76 0.77777778 0.78333333 0.7704918 ]
mean value: 0.7744828509904268
key: test_recall
value: [0.83333333 0.75 0.66666667 0.75 0.83333333 0.83333333
0.66666667 0.75 0.83333333 0.66666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.85185185 0.86111111 0.88888889 0.83333333 0.86111111 0.86111111
0.87962963 0.84259259 0.87037037 0.87037037]
mean value: 0.862037037037037
key: test_roc_auc
value: [0.875 0.70833333 0.66666667 0.54166667 0.75 0.625
0.58333333 0.75 0.83333333 0.625 ]
mean value: 0.6958333333333333
key: train_roc_auc
value: [0.79166667 0.80555556 0.80092593 0.82407407 0.7962963 0.81018519
0.80092593 0.80092593 0.81481481 0.80555556]
mean value: 0.8050925925925926
key: test_jcc
value: [0.76923077 0.5625 0.5 0.45 0.625 0.52631579
0.44444444 0.6 0.71428571 0.47058824]
mean value: 0.566236495272873
key: train_jcc
value: [0.67153285 0.68888889 0.69064748 0.703125 0.67883212 0.69402985
0.6884058 0.67910448 0.70149254 0.69117647]
mean value: 0.6887235467768253
MCC on Blind test: 0.08
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01374173 0.01213717 0.01173711 0.01166201 0.01164699 0.01172185
0.01182199 0.01179361 0.01172495 0.01177144]
mean value: 0.011975884437561035
key: score_time
value: [0.01068091 0.00942492 0.00945401 0.00938773 0.00937343 0.0093987
0.00945997 0.00949836 0.00942206 0.00950742]
mean value: 0.009560751914978027
key: test_mcc
value: [0.83333333 0.75261781 0.60246408 0.51298918 0.50709255 0.35355339
0.6761234 0.6761234 0.66666667 0.6761234 ]
mean value: 0.6257087215904491
key: train_mcc
value: [0.77822 0.81145561 0.78262379 0.81705949 0.8183303 0.79280145
0.74571454 0.78262379 0.78978412 0.79115136]
mean value: 0.7909764449231979
key: test_accuracy
value: [0.91666667 0.875 0.79166667 0.70833333 0.75 0.66666667
0.83333333 0.83333333 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_accuracy
value: [0.88425926 0.90277778 0.88888889 0.90740741 0.90740741 0.89351852
0.86574074 0.88888889 0.89351852 0.89351852]
mean value: 0.8925925925925926
key: test_fscore
value: [0.91666667 0.86956522 0.81481481 0.77419355 0.76923077 0.71428571
0.84615385 0.81818182 0.83333333 0.84615385]
mean value: 0.820257957459921
key: train_fscore
value: [0.89270386 0.90829694 0.89473684 0.91071429 0.91150442 0.89956332
0.87763713 0.89473684 0.89777778 0.89867841]
mean value: 0.8986349842049632
key: test_precision
value: [0.91666667 0.90909091 0.73333333 0.63157895 0.71428571 0.625
0.78571429 0.9 0.83333333 0.78571429]
mean value: 0.7834717475506949
key: train_precision
value: [0.832 0.85950413 0.85 0.87931034 0.87288136 0.85123967
0.80620155 0.85 0.86324786 0.85714286]
mean value: 0.8521527773191
key: test_recall
value: [0.91666667 0.83333333 0.91666667 1. 0.83333333 0.83333333
0.91666667 0.75 0.83333333 0.91666667]
mean value: 0.875
key: train_recall
value: [0.96296296 0.96296296 0.94444444 0.94444444 0.9537037 0.9537037
0.96296296 0.94444444 0.93518519 0.94444444]
mean value: 0.950925925925926
key: test_roc_auc
value: [0.91666667 0.875 0.79166667 0.70833333 0.75 0.66666667
0.83333333 0.83333333 0.83333333 0.83333333]
mean value: 0.8041666666666667
key: train_roc_auc
value: [0.88425926 0.90277778 0.88888889 0.90740741 0.90740741 0.89351852
0.86574074 0.88888889 0.89351852 0.89351852]
mean value: 0.8925925925925926
key: test_jcc
value: [0.84615385 0.76923077 0.6875 0.63157895 0.625 0.55555556
0.73333333 0.69230769 0.71428571 0.73333333]
mean value: 0.6988279191568665
key: train_jcc
value: [0.80620155 0.832 0.80952381 0.83606557 0.83739837 0.81746032
0.78195489 0.80952381 0.81451613 0.816 ]
mean value: 0.8160644450900069
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.1305542 1.02247667 1.16441059 1.02352166 1.15711308 0.49866986
1.0611999 0.99131846 0.8503859 1.16561484]
mean value: 1.006526517868042
key: score_time
value: [0.01458883 0.01414919 0.01413441 0.01450205 0.01472497 0.01218414
0.01464009 0.01218653 0.0121913 0.02671266]
mean value: 0.015001416206359863
key: test_mcc
value: [0.60246408 0.45834925 0.53033009 0.35355339 0.50709255 0.60246408
0.6761234 0.66666667 0.6761234 0.50709255]
mean value: 0.5580259457057222
key: train_mcc
value: [0.94444444 0.96312812 0.95374459 0.95407186 0.98164982 0.82174833
0.93522528 0.90284331 0.90004066 0.95374459]
mean value: 0.9310640995379719
key: test_accuracy
value: [0.79166667 0.70833333 0.75 0.66666667 0.75 0.79166667
0.83333333 0.83333333 0.83333333 0.75 ]
mean value: 0.7708333333333334
key: train_accuracy
value: [0.97222222 0.98148148 0.97685185 0.97685185 0.99074074 0.90740741
0.96759259 0.94907407 0.94907407 0.97685185]
mean value: 0.9648148148148148
key: test_fscore
value: [0.76190476 0.63157895 0.78571429 0.71428571 0.76923077 0.76190476
0.81818182 0.83333333 0.84615385 0.72727273]
mean value: 0.7649560965350439
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[0.97222222 0.98165138 0.97695853 0.97716895 0.99065421 0.9009901
0.96744186 0.95154185 0.95067265 0.97674419]
mean value: 0.9646045920575503
key: test_precision
value: [0.88888889 0.85714286 0.6875 0.625 0.71428571 0.88888889
0.9 0.83333333 0.78571429 0.8 ]
mean value: 0.7980753968253969
key: train_precision
value: [0.97222222 0.97272727 0.97247706 0.96396396 1. 0.96808511
0.97196262 0.90756303 0.92173913 0.98130841]
mean value: 0.9632048813198871
key: test_recall
value: [0.66666667 0.5 0.91666667 0.83333333 0.83333333 0.66666667
0.75 0.83333333 0.91666667 0.66666667]
mean value: 0.7583333333333333
key: train_recall
value: [0.97222222 0.99074074 0.98148148 0.99074074 0.98148148 0.84259259
0.96296296 1. 0.98148148 0.97222222]
mean value: 0.9675925925925926
key: test_roc_auc
value: [0.79166667 0.70833333 0.75 0.66666667 0.75 0.79166667
0.83333333 0.83333333 0.83333333 0.75 ]
mean value: 0.7708333333333334
key: train_roc_auc
value: [0.97222222 0.98148148 0.97685185 0.97685185 0.99074074 0.90740741
0.96759259 0.94907407 0.94907407 0.97685185]
mean value: 0.9648148148148148
key: test_jcc
value: [0.61538462 0.46153846 0.64705882 0.55555556 0.625 0.61538462
0.69230769 0.71428571 0.73333333 0.57142857]
mean value: 0.6231277382747971
key: train_jcc
value: [0.94594595 0.96396396 0.95495495 0.95535714 0.98148148 0.81981982
0.93693694 0.90756303 0.90598291 0.95454545]
mean value: 0.9326551631698691
MCC on Blind test: 0.38
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02105474 0.0174222 0.0158999 0.01779056 0.01491666 0.01658893
0.01581645 0.01523876 0.0158267 0.01972294]
mean value: 0.017027783393859863
key: score_time
value: [0.0117085 0.00898194 0.00898838 0.00868154 0.00867081 0.0085578
0.00858188 0.0085516 0.0085907 0.00888038]
mean value: 0.0090193510055542
key: test_mcc
value: [0.58536941 0.84515425 0.77459667 0.43033148 0.0860663 0.43033148
0.41812101 0.77459667 0.66666667 0.25819889]
mean value: 0.5269432824040078
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79166667 0.91666667 0.875 0.70833333 0.54166667 0.70833333
0.70833333 0.875 0.83333333 0.625 ]
mean value: 0.7583333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.90909091 0.88888889 0.66666667 0.59259259 0.74074074
0.72 0.85714286 0.83333333 0.66666667]
mean value: 0.7657731350774829
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 1. 0.8 0.77777778 0.53333333 0.66666667
0.69230769 1. 0.83333333 0.6 ]
mean value: 0.7721600621600622
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.83333333 1. 0.58333333 0.66666667 0.83333333
0.75 0.75 0.83333333 0.75 ]
mean value: 0.775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.91666667 0.875 0.70833333 0.54166667 0.70833333
0.70833333 0.875 0.83333333 0.625 ]
mean value: 0.7583333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.83333333 0.8 0.5 0.42105263 0.58823529
0.5625 0.75 0.71428571 0.5 ]
mean value: 0.6312264116172785
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.66
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09728909 0.0968554 0.10155797 0.09642196 0.09648037 0.0966258
0.09753299 0.09809136 0.09761524 0.09700441]
mean value: 0.09754745960235596
key: score_time
value: [0.01724434 0.01726818 0.01727366 0.01733685 0.01723623 0.01715851
0.01735926 0.01729465 0.01727057 0.01740146]
mean value: 0.017284369468688963
key: test_mcc
value: [0.75261781 0.45834925 0.6761234 0.27500955 0.41812101 0.53033009
0.77459667 0.58536941 0.91986621 0.41812101]
mean value: 0.5808504393563012
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.70833333 0.83333333 0.625 0.70833333 0.75
0.875 0.79166667 0.95833333 0.70833333]
mean value: 0.7833333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.63157895 0.84615385 0.68965517 0.72 0.78571429
0.85714286 0.8 0.95652174 0.72 ]
mean value: 0.7876332065314942
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.85714286 0.78571429 0.58823529 0.69230769 0.6875
1. 0.76923077 1. 0.69230769]
mean value: 0.7981529499911852
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 0.91666667 0.83333333 0.75 0.91666667
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.8
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.70833333 0.83333333 0.625 0.70833333 0.75
0.875 0.79166667 0.95833333 0.70833333]
mean value: 0.7833333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.46153846 0.73333333 0.52631579 0.5625 0.64705882
0.75 0.66666667 0.91666667 0.5625 ]
mean value: 0.6595810510438993
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.63
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01035261 0.00976276 0.00918221 0.00890946 0.00899911 0.0090673
0.00910568 0.00909877 0.00917673 0.00915146]
mean value: 0.00928061008453369
key: score_time
value: [0.00932956 0.00861287 0.00856042 0.00861096 0.00864172 0.00856161
0.00855303 0.00860214 0.00854778 0.00869441]
mean value: 0.008671450614929199
key: test_mcc
value: [ 0.2508726 0.43033148 0.35355339 -0.0836242 0.3380617 0.41812101
0.3380617 0.33333333 0.38490018 0.3380617 ]
mean value: 0.3101672898977476
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.625 0.70833333 0.66666667 0.45833333 0.66666667 0.70833333
0.66666667 0.66666667 0.66666667 0.66666667]
mean value: 0.65
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.64 0.74074074 0.6 0.48 0.69230769 0.69565217
0.63636364 0.66666667 0.55555556 0.69230769]
mean value: 0.6399594157855027
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.61538462 0.66666667 0.75 0.46153846 0.64285714 0.72727273
0.7 0.66666667 0.83333333 0.64285714]
mean value: 0.6706576756576756
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.83333333 0.5 0.5 0.75 0.66666667
0.58333333 0.66666667 0.41666667 0.75 ]
mean value: 0.6333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.625 0.70833333 0.66666667 0.45833333 0.66666667 0.70833333
0.66666667 0.66666667 0.66666667 0.66666667]
mean value: 0.65
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.47058824 0.58823529 0.42857143 0.31578947 0.52941176 0.53333333
0.46666667 0.5 0.38461538 0.52941176]
mean value: 0.47466233456945534
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.56
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.31476283 1.30447841 1.46462393 1.34692645 1.30058432 1.30353808
1.35730481 1.30286336 1.30742908 1.2984364 ]
mean value: 1.3300947666168212
key: score_time
value: [0.08968663 0.0898366 0.09236312 0.09146023 0.08972764 0.0894835
0.09211373 0.08971739 0.09021425 0.09686351]
mean value: 0.09114665985107422
key: test_mcc
value: [0.77459667 0.45834925 0.64168895 0.43033148 0.60246408 0.2508726
0.70710678 0.66666667 0.91986621 0.33333333]
mean value: 0.5785276019860456
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.70833333 0.79166667 0.70833333 0.79166667 0.625
0.83333333 0.83333333 0.95833333 0.66666667]
mean value: 0.7791666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.63157895 0.82758621 0.74074074 0.81481481 0.60869565
0.8 0.83333333 0.95652174 0.66666667]
mean value: 0.7737080958267734
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.85714286 0.70588235 0.66666667 0.73333333 0.63636364
1. 0.83333333 1. 0.66666667]
mean value: 0.809938884644767
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 1. 0.83333333 0.91666667 0.58333333
0.66666667 0.83333333 0.91666667 0.66666667]
mean value: 0.7666666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.70833333 0.79166667 0.70833333 0.79166667 0.625
0.83333333 0.83333333 0.95833333 0.66666667]
mean value: 0.7791666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.75 0.46153846 0.70588235 0.58823529 0.6875 0.4375
0.66666667 0.71428571 0.91666667 0.5 ]
mean value: 0.6428275156216333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.31
Accuracy on Blind test: 0.66
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.90877461 0.96970248 0.89637876 0.99821568 0.90704012 0.97669077
0.92202234 0.96597815 0.96081018 0.92725229]
mean value: 0.9432865381240845
key: score_time
value: [0.23616505 0.23439336 0.24036169 0.15945172 0.21677756 0.21420693
0.23975849 0.24806404 0.22143126 0.23800039]
mean value: 0.2248610496520996
key: test_mcc
value: [0.77459667 0.53033009 0.64168895 0.50709255 0.6761234 0.3380617
0.77459667 0.75261781 0.84515425 0.50709255]
mean value: 0.6347354647375963
key: train_mcc
value: [0.90803041 0.89849486 0.89849486 0.90756304 0.95374459 0.89818665
0.89911222 0.91702052 0.89911222 0.90756304]
mean value: 0.9087322406621408
key: test_accuracy
value: [0.875 0.75 0.79166667 0.75 0.83333333 0.66666667
0.875 0.875 0.91666667 0.75 ]
mean value: 0.8083333333333333
key: train_accuracy
value: [0.9537037 0.94907407 0.94907407 0.9537037 0.97685185 0.94907407
0.94907407 0.95833333 0.94907407 0.9537037 ]
mean value: 0.9541666666666667
key: test_fscore
value: [0.85714286 0.7 0.82758621 0.76923077 0.84615385 0.63636364
0.85714286 0.86956522 0.90909091 0.76923077]
mean value: 0.8041507068643501
key: train_fscore
value: [0.95454545 0.94977169 0.94977169 0.95412844 0.97674419 0.94930876
0.95022624 0.95890411 0.95022624 0.95412844]
mean value: 0.9547755254358538
key: test_precision
value: [1. 0.875 0.70588235 0.71428571 0.78571429 0.7
1. 0.90909091 1. 0.71428571]
mean value: 0.84042589763178
key: train_precision
value: [0.9375 0.93693694 0.93693694 0.94545455 0.98130841 0.94495413
0.92920354 0.94594595 0.92920354 0.94545455]
mean value: 0.9432898530030248
key: test_recall
value: [0.75 0.58333333 1. 0.83333333 0.91666667 0.58333333
0.75 0.83333333 0.83333333 0.83333333]
mean value: 0.7916666666666667
key: train_recall
value: [0.97222222 0.96296296 0.96296296 0.96296296 0.97222222 0.9537037
0.97222222 0.97222222 0.97222222 0.96296296]
mean value: 0.9666666666666667
key: test_roc_auc
value: [0.875 0.75 0.79166667 0.75 0.83333333 0.66666667
0.875 0.875 0.91666667 0.75 ]
mean value: 0.8083333333333333
key: train_roc_auc
value: [0.9537037 0.94907407 0.94907407 0.9537037 0.97685185 0.94907407
0.94907407 0.95833333 0.94907407 0.9537037 ]
mean value: 0.9541666666666666
key: test_jcc
value: [0.75 0.53846154 0.70588235 0.625 0.73333333 0.46666667
0.75 0.76923077 0.83333333 0.625 ]
mean value: 0.6796907993966818
key: train_jcc
value: [0.91304348 0.90434783 0.90434783 0.9122807 0.95454545 0.90350877
0.90517241 0.92105263 0.90517241 0.9122807 ]
mean value: 0.9135752219583988
MCC on Blind test: 0.36
Accuracy on Blind test: 0.68
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01139498 0.01070619 0.01048851 0.01043177 0.01021743 0.00961018
0.0094676 0.01010823 0.00943756 0.0102725 ]
mean value: 0.010213494300842285
key: score_time
value: [0.00988126 0.00973415 0.00977278 0.00937963 0.00891113 0.00978112
0.00912976 0.00887871 0.00883675 0.0089314 ]
mean value: 0.009323668479919434
key: test_mcc
value: [0.58536941 0.43033148 0.58536941 0.57735027 0.2508726 0.3380617
0.58536941 0.5 0.60246408 0.6761234 ]
mean value: 0.5131311757869508
key: train_mcc
value: [0.56542109 0.63957467 0.64111887 0.62103628 0.65366344 0.64023511
0.58638277 0.63355259 0.60187765 0.62361342]
mean value: 0.6206475899027868
key: test_accuracy
value: [0.79166667 0.70833333 0.79166667 0.75 0.625 0.66666667
0.79166667 0.75 0.79166667 0.83333333]
mean value: 0.75
key: train_accuracy
value: [0.78240741 0.81944444 0.81944444 0.81018519 0.82407407 0.81944444
0.79166667 0.81481481 0.80092593 0.81018519]
mean value: 0.8092592592592592
key: test_fscore
value: [0.7826087 0.66666667 0.8 0.8 0.64 0.69230769
0.7826087 0.75 0.81481481 0.84615385]
mean value: 0.7575160411247368
key: train_fscore
value: [0.78733032 0.82352941 0.82666667 0.81447964 0.83478261 0.82511211
0.80176211 0.8245614 0.8 0.81938326]
mean value: 0.8157607527459586
key: test_precision
value: [0.81818182 0.77777778 0.76923077 0.66666667 0.61538462 0.64285714
0.81818182 0.75 0.73333333 0.78571429]
mean value: 0.7377328227328227
key: train_precision
value: [0.7699115 0.80530973 0.79487179 0.79646018 0.78688525 0.8
0.76470588 0.78333333 0.80373832 0.78151261]
mean value: 0.7886728595187938
key: test_recall
value: [0.75 0.58333333 0.83333333 1. 0.66666667 0.75
0.75 0.75 0.91666667 0.91666667]
mean value: 0.7916666666666666
key: train_recall
value: [0.80555556 0.84259259 0.86111111 0.83333333 0.88888889 0.85185185
0.84259259 0.87037037 0.7962963 0.86111111]
mean value: 0.8453703703703703
key: test_roc_auc
value: [0.79166667 0.70833333 0.79166667 0.75 0.625 0.66666667
0.79166667 0.75 0.79166667 0.83333333]
mean value: 0.75
key: train_roc_auc
value: [0.78240741 0.81944444 0.81944444 0.81018519 0.82407407 0.81944444
0.79166667 0.81481481 0.80092593 0.81018519]
mean value: 0.8092592592592592
key: test_jcc
value: [0.64285714 0.5 0.66666667 0.66666667 0.47058824 0.52941176
0.64285714 0.6 0.6875 0.73333333]
mean value: 0.6139880952380953
key: train_jcc
value: [0.64925373 0.7 0.70454545 0.6870229 0.71641791 0.70229008
0.66911765 0.70149254 0.66666667 0.69402985]
mean value: 0.6890836775220928
MCC on Blind test: 0.23
Accuracy on Blind test: 0.63
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09627533 0.07803988 0.06252694 0.06555963 0.06782389 0.06903291
0.24227285 0.05894995 0.06242561 0.07130003]
mean value: 0.08742070198059082
key: score_time
value: [0.01102471 0.01105189 0.01042008 0.01050401 0.01191854 0.01149631
0.01142263 0.01128268 0.01066589 0.01059556]
mean value: 0.01103823184967041
key: test_mcc
value: [0.84515425 0.53033009 0.64168895 0.53033009 0.6761234 0.41812101
0.75261781 0.58536941 0.83333333 0.5 ]
mean value: 0.6313068332559123
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.75 0.79166667 0.75 0.83333333 0.70833333
0.875 0.79166667 0.91666667 0.75 ]
mean value: 0.8083333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.7 0.82758621 0.78571429 0.84615385 0.72
0.86956522 0.7826087 0.91666667 0.75 ]
mean value: 0.8107385827565737
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 0.70588235 0.6875 0.78571429 0.69230769
0.90909091 0.81818182 0.91666667 0.75 ]
mean value: 0.8140343724902548
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.58333333 1. 0.91666667 0.91666667 0.75
0.83333333 0.75 0.91666667 0.75 ]
mean value: 0.825
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.75 0.79166667 0.75 0.83333333 0.70833333
0.875 0.79166667 0.91666667 0.75 ]
mean value: 0.8083333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.53846154 0.70588235 0.64705882 0.73333333 0.5625
0.76923077 0.64285714 0.84615385 0.6 ]
mean value: 0.6878811139840552
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04071808 0.06085467 0.0598805 0.06004024 0.06288719 0.059587
0.05953264 0.0594027 0.0682199 0.05782318]
mean value: 0.05889461040496826
key: score_time
value: [0.02352667 0.02167249 0.02408361 0.02246499 0.02260733 0.02096868
0.02389741 0.02272534 0.01970649 0.0183506 ]
mean value: 0.0220003604888916
key: test_mcc
value: [ 0.58536941 0.5 0.50709255 0.41812101 -0.0836242 0.3380617
0.60246408 0.66666667 0.6761234 0.5 ]
mean value: 0.47102746122625055
key: train_mcc
value: [0.94460643 0.95374459 0.93554619 0.96362411 0.98164982 0.93522528
0.95407186 0.95374459 0.96296296 0.97259753]
mean value: 0.9557773347439162
key: test_accuracy
value: [0.79166667 0.75 0.75 0.70833333 0.45833333 0.66666667
0.79166667 0.83333333 0.83333333 0.75 ]
mean value: 0.7333333333333333
key: train_accuracy
value: [0.97222222 0.97685185 0.96759259 0.98148148 0.99074074 0.96759259
0.97685185 0.97685185 0.98148148 0.98611111]
mean value: 0.9777777777777777
key: test_fscore
value: [0.8 0.75 0.76923077 0.72 0.43478261 0.69230769
0.81481481 0.83333333 0.84615385 0.75 ]
mean value: 0.7410623064536108
key: train_fscore
value: [0.97247706 0.97674419 0.96803653 0.98181818 0.99082569 0.96774194
0.97716895 0.97695853 0.98148148 0.98630137]
mean value: 0.9779553911784314
key: test_precision
value: [0.76923077 0.75 0.71428571 0.69230769 0.45454545 0.64285714
0.73333333 0.83333333 0.78571429 0.75 ]
mean value: 0.7125607725607725
key: train_precision
value: [0.96363636 0.98130841 0.95495495 0.96428571 0.98181818 0.96330275
0.96396396 0.97247706 0.98148148 0.97297297]
mean value: 0.9700201860842348
key: test_recall
value: [0.83333333 0.75 0.83333333 0.75 0.41666667 0.75
0.91666667 0.83333333 0.91666667 0.75 ]
mean value: 0.775
key: train_recall
value: [0.98148148 0.97222222 0.98148148 1. 1. 0.97222222
0.99074074 0.98148148 0.98148148 1. ]
mean value: 0.9861111111111112
key: test_roc_auc
value: [0.79166667 0.75 0.75 0.70833333 0.45833333 0.66666667
0.79166667 0.83333333 0.83333333 0.75 ]
mean value: 0.7333333333333334
key: train_roc_auc
value: [0.97222222 0.97685185 0.96759259 0.98148148 0.99074074 0.96759259
0.97685185 0.97685185 0.98148148 0.98611111]
mean value: 0.9777777777777779
key: test_jcc
value: [0.66666667 0.6 0.625 0.5625 0.27777778 0.52941176
0.6875 0.71428571 0.73333333 0.6 ]
mean value: 0.5996475256769375
key: train_jcc
value: [0.94642857 0.95454545 0.9380531 0.96428571 0.98181818 0.9375
0.95535714 0.95495495 0.96363636 0.97297297]
mean value: 0.956955245384449
MCC on Blind test: 0.37
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01308775 0.01544952 0.01027894 0.00918627 0.0089798 0.00915718
0.00889015 0.00921917 0.00905752 0.00887823]
mean value: 0.010218453407287598
key: score_time
value: [0.0117414 0.00944948 0.00892258 0.00871181 0.008641 0.00863171
0.00862813 0.0086782 0.00863409 0.00874138]
mean value: 0.009077978134155274
key: test_mcc
value: [0.43033148 0.58536941 0.41812101 0.45834925 0.38490018 0.58536941
0.5 0.35355339 0.3380617 0.66666667]
mean value: 0.4720722489050611
key: train_mcc
value: [0.4472136 0.50557897 0.50709255 0.47111148 0.52174919 0.46812868
0.47684381 0.49433502 0.4406788 0.48685383]
mean value: 0.4819585924926295
key: test_accuracy
value: [0.70833333 0.79166667 0.70833333 0.70833333 0.66666667 0.79166667
0.75 0.66666667 0.66666667 0.83333333]
mean value: 0.7291666666666666
key: train_accuracy
value: [0.72222222 0.75 0.75 0.73148148 0.75925926 0.73148148
0.73611111 0.74537037 0.71759259 0.74074074]
mean value: 0.7384259259259259
key: test_fscore
value: [0.74074074 0.7826087 0.72 0.75862069 0.73333333 0.8
0.75 0.71428571 0.63636364 0.83333333]
mean value: 0.7469286143364104
key: train_fscore
value: [0.73684211 0.76724138 0.76923077 0.75423729 0.77192982 0.75
0.75324675 0.75982533 0.73819742 0.75862069]
mean value: 0.7559371561806815
key: test_precision
value: [0.66666667 0.81818182 0.69230769 0.64705882 0.61111111 0.76923077
0.75 0.625 0.7 0.83333333]
mean value: 0.7112890214360803
key: train_precision
value: [0.7 0.71774194 0.71428571 0.6953125 0.73333333 0.7016129
0.70731707 0.71900826 0.688 0.70967742]
mean value: 0.7086289143317105
key: test_recall
value: [0.83333333 0.75 0.75 0.91666667 0.91666667 0.83333333
0.75 0.83333333 0.58333333 0.83333333]
mean value: 0.8
key: train_recall
value: [0.77777778 0.82407407 0.83333333 0.82407407 0.81481481 0.80555556
0.80555556 0.80555556 0.7962963 0.81481481]
mean value: 0.8101851851851852
key: test_roc_auc
value: [0.70833333 0.79166667 0.70833333 0.70833333 0.66666667 0.79166667
0.75 0.66666667 0.66666667 0.83333333]
mean value: 0.7291666666666666
key: train_roc_auc
value: [0.72222222 0.75 0.75 0.73148148 0.75925926 0.73148148
0.73611111 0.74537037 0.71759259 0.74074074]
mean value: 0.7384259259259259
key: test_jcc
value: [0.58823529 0.64285714 0.5625 0.61111111 0.57894737 0.66666667
0.6 0.55555556 0.46666667 0.71428571]
mean value: 0.5986825519681557
key: train_jcc
value: [0.58333333 0.62237762 0.625 0.60544218 0.62857143 0.6
0.60416667 0.61267606 0.58503401 0.61111111]
mean value: 0.6077712408874381
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01085949 0.01415586 0.01646209 0.01394892 0.0151155 0.01408863
0.01640916 0.0158937 0.01625681 0.01627421]
mean value: 0.014946436882019043
key: score_time
value: [0.00876379 0.01124716 0.01204467 0.01177979 0.01172805 0.01171541
0.01189089 0.01158404 0.01160598 0.01158595]
mean value: 0.011394572257995606
key: test_mcc
value: [0.60246408 0.58536941 0.53033009 0.4472136 0.35355339 0.2508726
0.5 0.60246408 0.58536941 0.50709255]
mean value: 0.4964729193985725
key: train_mcc
value: [0.81145561 0.71739923 0.76459339 0.64168895 0.66332496 0.77898084
0.80235109 0.70238053 0.77253603 0.87996919]
mean value: 0.7534679801942501
key: test_accuracy
value: [0.79166667 0.79166667 0.75 0.66666667 0.66666667 0.625
0.75 0.79166667 0.79166667 0.75 ]
mean value: 0.7374999999999999
key: train_accuracy
value: [0.90277778 0.84259259 0.875 0.79166667 0.80555556 0.88888889
0.89351852 0.83333333 0.88425926 0.93981481]
mean value: 0.8657407407407407
key: test_fscore
value: [0.76190476 0.7826087 0.78571429 0.75 0.6 0.64
0.75 0.76190476 0.8 0.72727273]
mean value: 0.735940523244871
key: train_fscore
value: [0.89655172 0.86290323 0.86153846 0.82758621 0.75862069 0.89189189
0.90295359 0.8021978 0.87804878 0.93896714]
mean value: 0.8621259505260193
key: test_precision
value: [0.88888889 0.81818182 0.6875 0.6 0.75 0.61538462
0.75 0.88888889 0.76923077 0.8 ]
mean value: 0.756807498057498
key: train_precision
value: [0.95789474 0.76428571 0.96551724 0.70588235 1. 0.86842105
0.82945736 0.98648649 0.92783505 0.95238095]
mean value: 0.8958160952834802
key: test_recall
value: [0.66666667 0.75 0.91666667 1. 0.5 0.66666667
0.75 0.66666667 0.83333333 0.66666667]
mean value: 0.7416666666666667
key: train_recall
value: [0.84259259 0.99074074 0.77777778 1. 0.61111111 0.91666667
0.99074074 0.67592593 0.83333333 0.92592593]
mean value: 0.8564814814814815
key: test_roc_auc
value: [0.79166667 0.79166667 0.75 0.66666667 0.66666667 0.625
0.75 0.79166667 0.79166667 0.75 ]
mean value: 0.7374999999999999
key: train_roc_auc
value: [0.90277778 0.84259259 0.875 0.79166667 0.80555556 0.88888889
0.89351852 0.83333333 0.88425926 0.93981481]
mean value: 0.8657407407407407
key: test_jcc
value: [0.61538462 0.64285714 0.64705882 0.6 0.42857143 0.47058824
0.6 0.61538462 0.66666667 0.57142857]
mean value: 0.585794009911657
key: train_jcc
value: [0.8125 0.75886525 0.75675676 0.70588235 0.61111111 0.80487805
0.82307692 0.66972477 0.7826087 0.88495575]
mean value: 0.7610359659400171
MCC on Blind test: 0.42
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01570725 0.01456785 0.01438451 0.01643419 0.01524568 0.01557255
0.01395392 0.01653457 0.01462817 0.01384068]
mean value: 0.015086936950683593
key: score_time
value: [0.01181507 0.01158404 0.01152992 0.01155901 0.01153064 0.01153803
0.01155543 0.01160192 0.01161337 0.01156521]
mean value: 0.011589264869689942
key: test_mcc
value: [0.57735027 0.6761234 0.64168895 0.53033009 0.43033148 0.33333333
0.64168895 0.4472136 0.77459667 0.50709255]
mean value: 0.5559749288525665
key: train_mcc
value: [0.62017367 0.72861674 0.54167626 0.89911222 0.74428277 0.85243671
0.75734016 0.42465029 0.79697229 0.6824715 ]
mean value: 0.7047732619322473
key: test_accuracy
value: [0.75 0.83333333 0.79166667 0.75 0.70833333 0.66666667
0.79166667 0.66666667 0.875 0.75 ]
mean value: 0.7583333333333333
key: train_accuracy
value: [0.77777778 0.85185185 0.72685185 0.94907407 0.85648148 0.92592593
0.875 0.65277778 0.89351852 0.8287037 ]
mean value: 0.8337962962962963
key: test_fscore
value: [0.8 0.81818182 0.73684211 0.78571429 0.66666667 0.66666667
0.73684211 0.5 0.88888889 0.72727273]
mean value: 0.7327075263917369
key: train_fscore
value: [0.81818182 0.86885246 0.62420382 0.95022624 0.83243243 0.92727273
0.86567164 0.46808511 0.90128755 0.80213904]
mean value: 0.805835284215856
key: test_precision
value: [0.66666667 0.9 1. 0.6875 0.77777778 0.66666667
1. 1. 0.8 0.8 ]
mean value: 0.8298611111111112
key: train_precision
value: [0.69230769 0.77941176 1. 0.92920354 1. 0.91071429
0.93548387 1. 0.84 0.94936709]
mean value: 0.9036488242126206
key: test_recall
value: [1. 0.75 0.58333333 0.91666667 0.58333333 0.66666667
0.58333333 0.33333333 1. 0.66666667]
mean value: 0.7083333333333334
key: train_recall
value: [1. 0.98148148 0.4537037 0.97222222 0.71296296 0.94444444
0.80555556 0.30555556 0.97222222 0.69444444]
mean value: 0.7842592592592592
key: test_roc_auc
value: [0.75 0.83333333 0.79166667 0.75 0.70833333 0.66666667
0.79166667 0.66666667 0.875 0.75 ]
mean value: 0.7583333333333333
key: train_roc_auc
value: [0.77777778 0.85185185 0.72685185 0.94907407 0.85648148 0.92592593
0.875 0.65277778 0.89351852 0.8287037 ]
mean value: 0.8337962962962963
key: test_jcc
value: [0.66666667 0.69230769 0.58333333 0.64705882 0.5 0.5
0.58333333 0.33333333 0.8 0.57142857]
mean value: 0.5877461753932343
key: train_jcc
value: [0.69230769 0.76811594 0.4537037 0.90517241 0.71296296 0.86440678
0.76315789 0.30555556 0.8203125 0.66964286]
mean value: 0.695533830189272
MCC on Blind test: 0.37
Accuracy on Blind test: 0.71
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.133183 0.11564565 0.11664176 0.11746097 0.11651921 0.11830211
0.11635089 0.11619401 0.11829281 0.11623812]
mean value: 0.1184828519821167
key: score_time
value: [0.01493979 0.01472211 0.0150187 0.01492405 0.01475787 0.0148735
0.01482558 0.01505041 0.01577067 0.01480699]
mean value: 0.014968967437744141
key: test_mcc
value: [0.66666667 0.57735027 0.70710678 0.3380617 0.3380617 0.43033148
0.6761234 0.58536941 0.75261781 0.50709255]
mean value: 0.5578781776368856
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.75 0.83333333 0.66666667 0.66666667 0.70833333
0.83333333 0.79166667 0.875 0.75 ]
mean value: 0.7708333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.66666667 0.85714286 0.69230769 0.69230769 0.74074074
0.81818182 0.7826087 0.88 0.72727273]
mean value: 0.7690562223605701
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 1. 0.75 0.64285714 0.64285714 0.66666667
0.9 0.81818182 0.84615385 0.8 ]
mean value: 0.790004995004995
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.5 1. 0.75 0.75 0.83333333
0.75 0.75 0.91666667 0.66666667]
mean value: 0.775
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.75 0.83333333 0.66666667 0.66666667 0.70833333
0.83333333 0.79166667 0.875 0.75 ]
mean value: 0.7708333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.5 0.75 0.52941176 0.52941176 0.58823529
0.69230769 0.64285714 0.78571429 0.57142857]
mean value: 0.6303652230122818
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04627037 0.04262686 0.04359245 0.05944014 0.0541997 0.05315614
0.05596638 0.05864263 0.05465889 0.06446767]
mean value: 0.05330212116241455
key: score_time
value: [0.01904321 0.0231092 0.02482772 0.02747893 0.02421951 0.02360153
0.02202749 0.03683686 0.02723002 0.04126835]
mean value: 0.026964282989501952
key: test_mcc
value: [0.60246408 0.43033148 0.50709255 0.58536941 0.3380617 0.16666667
0.43033148 0.60246408 0.91986621 0.41812101]
mean value: 0.5000768662388781
key: train_mcc
value: [0.97259753 0.98164982 0.98164982 0.96362411 0.98164982 0.96312812
0.98164982 0.97259753 0.98164982 0.99078321]
mean value: 0.9770979584094492
key: test_accuracy
value: [0.79166667 0.70833333 0.75 0.79166667 0.66666667 0.58333333
0.70833333 0.79166667 0.95833333 0.70833333]
mean value: 0.7458333333333333
key: train_accuracy
value: [0.98611111 0.99074074 0.99074074 0.98148148 0.99074074 0.98148148
0.99074074 0.98611111 0.99074074 0.99537037]
mean value: 0.9884259259259259
key: test_fscore
value: [0.76190476 0.66666667 0.76923077 0.8 0.63636364 0.58333333
0.66666667 0.76190476 0.95652174 0.69565217]
mean value: 0.7298244509114075
key: train_fscore
value: [0.98591549 0.99065421 0.99065421 0.98113208 0.99065421 0.98130841
0.99065421 0.98591549 0.99065421 0.99534884]
mean value: 0.988289133784883
key: test_precision
value: [0.88888889 0.77777778 0.71428571 0.76923077 0.7 0.58333333
0.77777778 0.88888889 1. 0.72727273]
mean value: 0.7827455877455878
key: train_precision
value: [1. 1. 1. 1. 1. 0.99056604
1. 1. 1. 1. ]
mean value: 0.9990566037735849
key: test_recall
value: [0.66666667 0.58333333 0.83333333 0.83333333 0.58333333 0.58333333
0.58333333 0.66666667 0.91666667 0.66666667]
mean value: 0.6916666666666667
key: train_recall
value: [0.97222222 0.98148148 0.98148148 0.96296296 0.98148148 0.97222222
0.98148148 0.97222222 0.98148148 0.99074074]
mean value: 0.9777777777777777
key: test_roc_auc
value: [0.79166667 0.70833333 0.75 0.79166667 0.66666667 0.58333333
0.70833333 0.79166667 0.95833333 0.70833333]
mean value: 0.7458333333333333
key: train_roc_auc
value: [0.98611111 0.99074074 0.99074074 0.98148148 0.99074074 0.98148148
0.99074074 0.98611111 0.99074074 0.99537037]
mean value: 0.9884259259259259
key: test_jcc
value: [0.61538462 0.5 0.625 0.66666667 0.46666667 0.41176471
0.5 0.61538462 0.91666667 0.53333333]
mean value: 0.5850867269984917
key: train_jcc
value: [0.97222222 0.98148148 0.98148148 0.96296296 0.98148148 0.96330275
0.98148148 0.97222222 0.98148148 0.99074074]
mean value: 0.9768858307849133
MCC on Blind test: 0.37
Accuracy on Blind test: 0.68
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.04280233 0.0681746 0.06288624 0.06788301 0.042377 0.02782488
0.0268805 0.04071689 0.02718902 0.06667638]
mean value: 0.04734108448028564
key: score_time
value: [0.02413058 0.02410936 0.02109694 0.02472234 0.01284194 0.01278472
0.01273441 0.01275396 0.01282382 0.01282883]
mean value: 0.017082691192626953
key: test_mcc
value: [0.58536941 0.41812101 0.41812101 0.1767767 0.3380617 0.1767767
0.5 0.66666667 0.6761234 0.41812101]
mean value: 0.437413758494976
key: train_mcc
value: [0.99078321 0.99078321 0.99078321 0.99078321 1. 0.99078321
0.99078321 0.99078321 0.99078321 0.98164982]
mean value: 0.9907915525187407
key: test_accuracy
value: [0.79166667 0.70833333 0.70833333 0.58333333 0.66666667 0.58333333
0.75 0.83333333 0.83333333 0.70833333]
mean value: 0.7166666666666667
key: train_accuracy
value: [0.99537037 0.99537037 0.99537037 0.99537037 1. 0.99537037
0.99537037 0.99537037 0.99537037 0.99074074]
mean value: 0.9953703703703703
key: test_fscore
value: [0.7826087 0.69565217 0.69565217 0.64285714 0.69230769 0.64285714
0.75 0.83333333 0.84615385 0.72 ]
mean value: 0.7301422200987419
key: train_fscore
value: [0.99539171 0.99539171 0.99539171 0.99539171 1. 0.99539171
0.99539171 0.99539171 0.99539171 0.99082569]
mean value: 0.9953959328626389
key: test_precision
value: [0.81818182 0.72727273 0.72727273 0.5625 0.64285714 0.5625
0.75 0.83333333 0.78571429 0.69230769]
mean value: 0.7101939726939727
key: train_precision
value: [0.99082569 0.99082569 0.99082569 0.99082569 1. 0.99082569
0.99082569 0.99082569 0.99082569 0.98181818]
mean value: 0.9908423686405339
key: test_recall
value: [0.75 0.66666667 0.66666667 0.75 0.75 0.75
0.75 0.83333333 0.91666667 0.75 ]
mean value: 0.7583333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79166667 0.70833333 0.70833333 0.58333333 0.66666667 0.58333333
0.75 0.83333333 0.83333333 0.70833333]
mean value: 0.7166666666666667
key: train_roc_auc
value: [0.99537037 0.99537037 0.99537037 0.99537037 1. 0.99537037
0.99537037 0.99537037 0.99537037 0.99074074]
mean value: 0.9953703703703703
key: test_jcc
value: [0.64285714 0.53333333 0.53333333 0.47368421 0.52941176 0.47368421
0.6 0.71428571 0.73333333 0.5625 ]
mean value: 0.5796423042901371
key: train_jcc
value: [0.99082569 0.99082569 0.99082569 0.99082569 1. 0.99082569
0.99082569 0.99082569 0.99082569 0.98181818]
mean value: 0.9908423686405339
MCC on Blind test: 0.16
Accuracy on Blind test: 0.59
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.40733719 0.3867228 0.38469768 0.3909452 0.38922071 0.39397931
0.38517714 0.38699746 0.38558316 0.38554788]
mean value: 0.38962085247039796
key: score_time
value: [0.0094502 0.00933933 0.00938606 0.00928497 0.00938368 0.01016212
0.00926542 0.00927734 0.00943732 0.00922227]
mean value: 0.00942087173461914
key: test_mcc
value: [0.83333333 0.53033009 0.64168895 0.6761234 0.58536941 0.5
0.75261781 0.6761234 0.91986621 0.58536941]
mean value: 0.6700822008732727
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.75 0.79166667 0.83333333 0.79166667 0.75
0.875 0.83333333 0.95833333 0.79166667]
mean value: 0.8291666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91666667 0.7 0.82758621 0.84615385 0.8 0.75
0.86956522 0.81818182 0.96 0.7826087 ]
mean value: 0.8270762450942362
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91666667 0.875 0.70588235 0.78571429 0.76923077 0.75
0.90909091 0.9 0.92307692 0.81818182]
mean value: 0.8352843724902549
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91666667 0.58333333 1. 0.91666667 0.83333333 0.75
0.83333333 0.75 1. 0.75 ]
mean value: 0.8333333333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.75 0.79166667 0.83333333 0.79166667 0.75
0.875 0.83333333 0.95833333 0.79166667]
mean value: 0.8291666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84615385 0.53846154 0.70588235 0.73333333 0.66666667 0.6
0.76923077 0.69230769 0.92307692 0.64285714]
mean value: 0.7117970265029089
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02279496 0.02264142 0.02134824 0.02143693 0.02124429 0.0212903
0.02151275 0.02106762 0.02139115 0.02206588]
mean value: 0.02167935371398926
key: score_time
value: [0.01800084 0.01225257 0.01478553 0.01450801 0.0146749 0.015486
0.0145545 0.01485181 0.01477313 0.01217294]
mean value: 0.014606022834777832
key: test_mcc
value: [ 0.41812101 0.41812101 0.43033148 0. -0.0860663 0.66666667
0.41812101 0.2508726 0.6761234 0.43033148]
mean value: 0.36226223577037264
key: train_mcc
value: [1. 0.91986621 0.87777662 1. 0.89442719 1.
0.92847669 0.97259753 0.95472741 0.95472741]
mean value: 0.9502599060234108
key: test_accuracy
value: [0.70833333 0.70833333 0.70833333 0.5 0.45833333 0.83333333
0.70833333 0.625 0.83333333 0.70833333]
mean value: 0.6791666666666667
key: train_accuracy
value: [1. 0.95833333 0.93518519 1. 0.94444444 1.
0.96296296 0.98611111 0.97685185 0.97685185]
mean value: 0.9740740740740741
key: test_fscore
value: [0.72 0.69565217 0.66666667 0.625 0.51851852 0.83333333
0.72 0.64 0.81818182 0.66666667]
mean value: 0.6904019177280046
key: train_fscore
value: [1. 0.96 0.93913043 1. 0.94736842 1.
0.96428571 0.98630137 0.97737557 0.97737557]
mean value: 0.9751837071205688
key: test_precision
value: [0.69230769 0.72727273 0.77777778 0.5 0.46666667 0.83333333
0.69230769 0.61538462 0.9 0.77777778]
mean value: 0.6982828282828283
key: train_precision
value: [1. 0.92307692 0.8852459 1. 0.9 1.
0.93103448 0.97297297 0.95575221 0.95575221]
mean value: 0.9523834705226623
key: test_recall
value: [0.75 0.66666667 0.58333333 0.83333333 0.58333333 0.83333333
0.75 0.66666667 0.75 0.58333333]
mean value: 0.7
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.70833333 0.70833333 0.70833333 0.5 0.45833333 0.83333333
0.70833333 0.625 0.83333333 0.70833333]
mean value: 0.6791666666666667
key: train_roc_auc
value: [1. 0.95833333 0.93518519 1. 0.94444444 1.
0.96296296 0.98611111 0.97685185 0.97685185]
mean value: 0.9740740740740741
key: test_jcc
value: [0.5625 0.53333333 0.5 0.45454545 0.35 0.71428571
0.5625 0.47058824 0.69230769 0.5 ]
mean value: 0.5340060429766312
key: train_jcc
value: [1. 0.92307692 0.8852459 1. 0.9 1.
0.93103448 0.97297297 0.95575221 0.95575221]
mean value: 0.9523834705226623
MCC on Blind test: 0.1
Accuracy on Blind test: 0.56
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02238536 0.03210616 0.03460526 0.03438401 0.03436708 0.03419876
0.03430533 0.03438544 0.03449702 0.03423023]
mean value: 0.03294646739959717
key: score_time
value: [0.01790905 0.02034116 0.0216105 0.02313852 0.02082276 0.02208257
0.02374363 0.02120662 0.02306151 0.02310467]
mean value: 0.021702098846435546
key: test_mcc
value: [0.6761234 0.60246408 0.60246408 0.43033148 0.2508726 0.3380617
0.6761234 0.6761234 0.60246408 0.58536941]
mean value: 0.5440397634389
key: train_mcc
value: [0.83390548 0.89849486 0.86203543 0.88057382 0.86144352 0.87996919
0.86292558 0.89133762 0.88904134 0.89026381]
mean value: 0.8749990658638971
key: test_accuracy
value: [0.83333333 0.79166667 0.79166667 0.70833333 0.625 0.66666667
0.83333333 0.83333333 0.79166667 0.79166667]
mean value: 0.7666666666666666
key: train_accuracy
value: [0.91666667 0.94907407 0.93055556 0.93981481 0.93055556 0.93981481
0.93055556 0.94444444 0.94444444 0.94444444]
mean value: 0.937037037037037
key: test_fscore
value: [0.81818182 0.76190476 0.81481481 0.74074074 0.64 0.69230769
0.81818182 0.81818182 0.81481481 0.7826087 ]
mean value: 0.7701736974780453
key: train_fscore
value: [0.91818182 0.94977169 0.9321267 0.94117647 0.93150685 0.94063927
0.93273543 0.94642857 0.94495413 0.94594595]
mean value: 0.9383466865645663
key: test_precision
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:195: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_rt.py:198: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.9 0.88888889 0.73333333 0.66666667 0.61538462 0.64285714
0.9 0.9 0.73333333 0.81818182]
mean value: 0.7798645798645799
key: train_precision
value: [0.90178571 0.93693694 0.91150442 0.92035398 0.91891892 0.92792793
0.90434783 0.9137931 0.93636364 0.92105263]
mean value: 0.919298510262696
key: test_recall
value: [0.75 0.66666667 0.91666667 0.83333333 0.66666667 0.75
0.75 0.75 0.91666667 0.75 ]
mean value: 0.775
key: train_recall
value: [0.93518519 0.96296296 0.9537037 0.96296296 0.94444444 0.9537037
0.96296296 0.98148148 0.9537037 0.97222222]
mean value: 0.9583333333333334
key: test_roc_auc
value: [0.83333333 0.79166667 0.79166667 0.70833333 0.625 0.66666667
0.83333333 0.83333333 0.79166667 0.79166667]
mean value: 0.7666666666666666
key: train_roc_auc
value: [0.91666667 0.94907407 0.93055556 0.93981481 0.93055556 0.93981481
0.93055556 0.94444444 0.94444444 0.94444444]
mean value: 0.937037037037037
key: test_jcc
value: [0.69230769 0.61538462 0.6875 0.58823529 0.47058824 0.52941176
0.69230769 0.69230769 0.6875 0.64285714]
mean value: 0.6298400129282482
key: train_jcc
value: [0.8487395 0.90434783 0.87288136 0.88888889 0.87179487 0.88793103
0.87394958 0.89830508 0.89565217 0.8974359 ]
mean value: 0.8839926208910636
MCC on Blind test: 0.42
Accuracy on Blind test: 0.72
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.23147917 0.30225873 0.2818923 0.12417746 0.19498205 0.23902297
0.23224235 0.23423767 0.19799876 0.15895414]
mean value: 0.21972455978393554
key: score_time
value: [0.02185488 0.02039647 0.02102041 0.01203465 0.02082801 0.02240419
0.02174664 0.02311754 0.0120647 0.02047658]
mean value: 0.019594407081604003
key: test_mcc
value: [0.75261781 0.60246408 0.70710678 0.53033009 0.16666667 0.33333333
0.77459667 0.75261781 0.75261781 0.6761234 ]
mean value: 0.6048474443196609
key: train_mcc
value: [0.73864041 0.78978412 0.77120096 0.78869542 0.98148148 0.77013788
0.75392071 0.78262379 0.77898084 0.77013788]
mean value: 0.7925603488837074
key: test_accuracy
value: [0.875 0.79166667 0.83333333 0.75 0.58333333 0.66666667
0.875 0.875 0.875 0.83333333]
mean value: 0.7958333333333334
key: train_accuracy
value: [0.86574074 0.89351852 0.88425926 0.89351852 0.99074074 0.88425926
0.875 0.88888889 0.88888889 0.88425926]
mean value: 0.8949074074074074
key: test_fscore
value: [0.86956522 0.76190476 0.85714286 0.78571429 0.58333333 0.66666667
0.85714286 0.86956522 0.88 0.84615385]
mean value: 0.7977189042841216
key: train_fscore
value: [0.87445887 0.89777778 0.88888889 0.89686099 0.99074074 0.88789238
0.88105727 0.89473684 0.89189189 0.88789238]
mean value: 0.8992198024496217
key: test_precision
value: [0.90909091 0.88888889 0.75 0.6875 0.58333333 0.66666667
1. 0.90909091 0.84615385 0.78571429]
mean value: 0.8026438838938839
key: train_precision
value: [0.82113821 0.86324786 0.85470085 0.86956522 0.99074074 0.86086957
0.84033613 0.85 0.86842105 0.86086957]
mean value: 0.867988920498302
key: test_recall
value: [0.83333333 0.66666667 1. 0.91666667 0.58333333 0.66666667
0.75 0.83333333 0.91666667 0.91666667]
mean value: 0.8083333333333333
key: train_recall
value: [0.93518519 0.93518519 0.92592593 0.92592593 0.99074074 0.91666667
0.92592593 0.94444444 0.91666667 0.91666667]
mean value: 0.9333333333333333
key: test_roc_auc
value: [0.875 0.79166667 0.83333333 0.75 0.58333333 0.66666667
0.875 0.875 0.875 0.83333333]
mean value: 0.7958333333333333
key: train_roc_auc
value: [0.86574074 0.89351852 0.88425926 0.89351852 0.99074074 0.88425926
0.875 0.88888889 0.88888889 0.88425926]
mean value: 0.8949074074074074
key: test_jcc
value: [0.76923077 0.61538462 0.75 0.64705882 0.41176471 0.5
0.75 0.76923077 0.78571429 0.73333333]
mean value: 0.6731717302305538
key: train_jcc
value: [0.77692308 0.81451613 0.8 0.81300813 0.98165138 0.7983871
0.78740157 0.80952381 0.80487805 0.7983871 ]
mean value: 0.8184676338839258
MCC on Blind test: 0.37
Accuracy on Blind test: 0.7