LSHTM_analysis/scripts/ml/log_rpob_rt.txt

11890 lines
587 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_rt.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 545, 1: 30}) Data dim: (575, 176)
-------------------------------------------------------------
Successfully split data: REVERSE training
imputed values: training set
actual values: blind test set
Train data size: (575, 176)
Test data size: (557, 176)
y_train numbers: Counter({0: 545, 1: 30})
y_train ratio: 18.166666666666668
y_test_numbers: Counter({0: 282, 1: 275})
y_test ratio: 1.0254545454545454
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 545, 1: 545})
(1090, 176)
Simple Random UnderSampling
Counter({0: 30, 1: 30})
(60, 176)
Simple Combined Over and UnderSampling
Counter({0: 545, 1: 545})
(1090, 176)
SMOTE_NC OverSampling
Counter({0: 545, 1: 545})
(1090, 176)
#####################################################################
Running ML analysis: REVERSE training
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_rt/
Sanity checks:
Total input features: 176
Training data size: (575, 176)
Test data size: (557, 176)
Target feature numbers (training data): Counter({0: 545, 1: 30})
Target features ratio (training data: 18.166666666666668
Target feature numbers (test data): Counter({0: 282, 1: 275})
Target features ratio (test data): 1.0254545454545454
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03620291 0.03169894 0.03200722 0.03302526 0.03463435 0.03355765
0.03698945 0.03275728 0.03071117 0.02976274]
mean value: 0.03313469886779785
key: score_time
value: [0.01243091 0.0119307 0.01226759 0.01270723 0.01222587 0.01198983
0.01438379 0.01209283 0.01194143 0.01193285]
mean value: 0.01239030361175537
key: test_mcc
value: [0.64848485 0.56713087 0.56713087 0. 0.38251843 0.56694671
0.2962963 0.38204659 0.80903983 0.64814815]
mean value: 0.4867742597878121
key: train_mcc
value: [0.78551705 0.78551705 0.78589709 0.8321053 0.80909988 0.78553305
0.76169209 0.80911474 0.78553305 0.80911474]
mean value: 0.7949124038845846
key: test_accuracy
value: [0.96551724 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
0.92982456 0.94736842 0.98245614 0.96491228]
mean value: 0.9582577132486388
key: train_accuracy
value: [0.98065764 0.98065764 0.98065764 0.98452611 0.98259188 0.98069498
0.97876448 0.98262548 0.98069498 0.98262548]
mean value: 0.9814496314496315
key: test_fscore
value: [0.66666667 0.5 0.5 0. 0.4 0.5
0.33333333 0.4 0.8 0.66666667]
mean value: 0.4766666666666667
key: train_fscore
value: [0.77272727 0.77272727 0.7826087 0.82608696 0.8 0.77272727
0.75555556 0.8 0.77272727 0.8 ]
mean value: 0.7855160298638559
key: test_precision
value: [0.66666667 1. 1. 0. 0.5 1.
0.33333333 0.5 1. 0.66666667]
mean value: 0.6666666666666666
key: train_precision
value: [1. 1. 0.94736842 1. 1. 1.
0.94444444 1. 1. 1. ]
mean value: 0.9891812865497076
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333
0.33333333 0.33333333 0.66666667 0.66666667]
mean value: 0.39999999999999997
key: train_recall
value: [0.62962963 0.62962963 0.66666667 0.7037037 0.66666667 0.62962963
0.62962963 0.66666667 0.62962963 0.66666667]
mean value: 0.6518518518518519
key: test_roc_auc
value: [0.82424242 0.66666667 0.66666667 0.5 0.65757576 0.66666667
0.64814815 0.65740741 0.83333333 0.82407407]
mean value: 0.6944781144781145
key: train_roc_auc
value: [0.81481481 0.81481481 0.83231293 0.85185185 0.83333333 0.81481481
0.81379648 0.83333333 0.81481481 0.83333333]
mean value: 0.8257220521157094
key: test_jcc
value: [0.5 0.33333333 0.33333333 0. 0.25 0.33333333
0.2 0.25 0.66666667 0.5 ]
mean value: 0.33666666666666667
key: train_jcc
value: [0.62962963 0.62962963 0.64285714 0.7037037 0.66666667 0.62962963
0.60714286 0.66666667 0.62962963 0.66666667]
mean value: 0.6472222222222223
MCC on Blind test: 0.2
Accuracy on Blind test: 0.55
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.88869524 0.74637437 0.73213053 0.80213022 0.75085831 0.79953766
0.74605346 0.7229135 0.95076203 0.73621321]
mean value: 0.78756685256958
key: score_time
value: [0.01215458 0.01214933 0.01236701 0.01245666 0.01223445 0.01911211
0.01208758 0.01212478 0.01212263 0.01623821]
mean value: 0.013304734230041504
key: test_mcc
value: [ 0.80917359 0.56713087 0.56713087 0. -0.03093441 0.56694671
0.2962963 0.38204659 0.80903983 0.2962963 ]
mean value: 0.4263126653890895
key: train_mcc
value: [0.5982906 0.62811222 0.89810042 0.65669104 0.68418345 0.68420281
0.65717642 0.62813245 0.62813245 0.98030899]
mean value: 0.7043330842580746
key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.93103448 0.96491228
0.92982456 0.94736842 0.98245614 0.92982456]
mean value: 0.9547489413188143
key: train_accuracy
value: [0.96711799 0.96905222 0.99032882 0.97098646 0.9729207 0.97297297
0.97104247 0.96911197 0.96911197 0.9980695 ]
mean value: 0.9750715069864005
key: test_fscore
value: [0.8 0.5 0.5 0. 0. 0.5
0.33333333 0.4 0.8 0.33333333]
mean value: 0.4166666666666667
key: train_fscore
value: [0.54054054 0.57894737 0.89795918 0.61538462 0.65 0.65
0.63414634 0.57894737 0.57894737 0.98113208]
mean value: 0.6706004861796896
key: test_precision
value: [1. 1. 1. 0. 0. 1.
0.33333333 0.5 1. 0.33333333]
mean value: 0.6166666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.92857143 1. 1. 1. ]
mean value: 0.9928571428571429
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.33333333 0.33333333 0.66666667 0.33333333]
mean value: 0.3333333333333333
key: train_recall
value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148
0.48148148 0.40740741 0.40740741 0.96296296]
mean value: 0.5259259259259259
key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5 0.49090909 0.66666667
0.64814815 0.65740741 0.83333333 0.64814815]
mean value: 0.6611279461279461
key: train_roc_auc
value: [0.68518519 0.7037037 0.90740741 0.72222222 0.74074074 0.74074074
0.73972241 0.7037037 0.7037037 0.98148148]
mean value: 0.762861129969073
key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.2 0.25 0.66666667 0.2 ]
mean value: 0.29833333333333334
key: train_jcc
value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148
0.46428571 0.40740741 0.40740741 0.96296296]
mean value: 0.5242063492063492
MCC on Blind test: 0.17
Accuracy on Blind test: 0.54
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01631737 0.01177216 0.01084805 0.01043868 0.0100987 0.0101819
0.01021171 0.01049137 0.01006961 0.01019645]
mean value: 0.01106259822845459
key: score_time
value: [0.01252532 0.00943542 0.00935078 0.00916195 0.00890803 0.00936007
0.00899339 0.00960541 0.00887513 0.00927067]
mean value: 0.009548616409301759
key: test_mcc
value: [0.43452409 0.51168172 0.1762953 0.28417826 0.45726459 0.28291258
0.51099032 0.30441977 0.68718427 0.51099032]
mean value: 0.41604412240331595
key: train_mcc
value: [0.42281221 0.43803842 0.60914572 0.45594583 0.46618385 0.43552025
0.44078566 0.44401007 0.42989307 0.49523539]
mean value: 0.4637570449954174
key: test_accuracy
value: [0.82758621 0.87931034 0.87931034 0.82758621 0.84482759 0.8245614
0.87719298 0.84210526 0.94736842 0.87719298]
mean value: 0.8627041742286751
key: train_accuracy
value: [0.82978723 0.84139265 0.92843327 0.86460348 0.86073501 0.83976834
0.84362934 0.85714286 0.84749035 0.87837838]
mean value: 0.8591360910509847
key: test_fscore
value: [0.375 0.46153846 0.22222222 0.28571429 0.4 0.28571429
0.46153846 0.30769231 0.66666667 0.46153846]
mean value: 0.39276251526251527
key: train_fscore
value: [0.37142857 0.3880597 0.58426966 0.41666667 0.41935484 0.38518519
0.39097744 0.40322581 0.3875969 0.45217391]
mean value: 0.4198938688732906
key: test_precision
value: [0.23076923 0.3 0.16666667 0.18181818 0.25 0.18181818
0.3 0.2 0.5 0.3 ]
mean value: 0.2611072261072261
key: train_precision
value: [0.2300885 0.24299065 0.41935484 0.2688172 0.26804124 0.24074074
0.24528302 0.25773196 0.24509804 0.29545455]
mean value: 0.2713600732946767
key: test_recall
value: [1. 1. 0.33333333 0.66666667 1. 0.66666667
1. 0.66666667 1. 1. ]
mean value: 0.8333333333333334
key: train_recall
value: [0.96296296 0.96296296 0.96296296 0.92592593 0.96296296 0.96296296
0.96296296 0.92592593 0.92592593 0.96296296]
mean value: 0.9518518518518518
key: test_roc_auc
value: [0.90909091 0.93636364 0.62121212 0.75151515 0.91818182 0.75
0.93518519 0.75925926 0.97222222 0.93518519]
mean value: 0.8488215488215488
key: train_roc_auc
value: [0.89270597 0.89882842 0.94474679 0.89357521 0.9090325 0.89797843
0.90001509 0.88964321 0.88455156 0.91834503]
mean value: 0.9029422192049483
key: test_jcc
value: [0.23076923 0.3 0.125 0.16666667 0.25 0.16666667
0.3 0.18181818 0.5 0.3 ]
mean value: 0.2520920745920746
key: train_jcc
value: [0.22807018 0.24074074 0.41269841 0.26315789 0.26530612 0.23853211
0.24299065 0.25252525 0.24038462 0.29213483]
mean value: 0.2676540809731464
MCC on Blind test: 0.43
Accuracy on Blind test: 0.69
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01049566 0.01031566 0.01039696 0.01031137 0.01143765 0.01037836
0.0102942 0.01034856 0.01027584 0.01027036]
mean value: 0.010452461242675782
key: score_time
value: [0.00897431 0.00893593 0.00913882 0.00890827 0.00900865 0.00888872
0.00890207 0.00890517 0.0088551 0.00887251]
mean value: 0.008938956260681152
key: test_mcc
value: [ 0.5508895 0.56713087 0.38251843 -0.04413674 -0.05454545 0.38204659
-0.03149704 0.38204659 0.76011695 0.80903983]
mean value: 0.37036095219850856
key: train_mcc
value: [0.58249009 0.5427084 0.62227177 0.55500996 0.60935992 0.62292056
0.58514128 0.58191044 0.58514128 0.5955795 ]
mean value: 0.5882533192252368
key: test_accuracy
value: [0.94827586 0.96551724 0.94827586 0.9137931 0.89655172 0.94736842
0.92982456 0.94736842 0.96491228 0.98245614]
mean value: 0.9444343617664852
key: train_accuracy
value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996
0.95752896 0.96138996 0.95752896 0.96138996]
mean value: 0.9598067257641726
key: test_fscore
value: [0.57142857 0.5 0.4 0. 0. 0.4
0. 0.4 0.75 0.8 ]
mean value: 0.3821428571428572
key: train_fscore /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714
0.60714286 0.6 0.60714286 0.61538462]
mean value: 0.6088222284559688
key: test_precision
value: [0.5 1. 0.5 0. 0. 0.5 0. 0.5 0.6 1. ]
mean value: 0.46
key: train_precision
value: [0.61538462 0.57692308 0.65384615 0.6 0.66666667 0.62068966
0.5862069 0.65217391 0.5862069 0.64 ]
mean value: 0.6198097874139853
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0. 0.33333333 1. 0.66666667]
mean value: 0.36666666666666664
key: train_recall
value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667
0.62962963 0.55555556 0.62962963 0.59259259]
mean value: 0.6
key: test_roc_auc
value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741
0.49074074 0.65740741 0.98148148 0.83333333]
mean value: 0.6714309764309764
key: train_roc_auc
value: [0.78609221 0.76655329 0.80563114 0.7675737 0.78813303 0.8221317
0.80259486 0.76963114 0.80259486 0.78713133]
mean value: 0.7898067251340455
key: test_jcc
value: [0.4 0.33333333 0.25 0. 0. 0.25
0. 0.25 0.6 0.66666667]
mean value: 0.275
key: train_jcc
value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421
0.43589744 0.42857143 0.43589744 0.44444444]
mean value: 0.4380434714645241
MCC on Blind test: 0.2
Accuracy on Blind test: 0.56
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00958323 0.01032209 0.01076818 0.01059055 0.01070786 0.0109334
0.00970054 0.0109117 0.01076508 0.00970888]
mean value: 0.010399150848388671
key: score_time
value: [0.07445121 0.01275253 0.0135901 0.0133009 0.01288939 0.01299
0.01248527 0.01255918 0.01344275 0.01351595]
mean value: 0.019197726249694826
key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0. 0. 0.
0. 0.56694671 0.56694671 0. ]
mean value: 0.30773287583715697
key: train_mcc
value: [0.4990914 0.46161651 0.4990914 0.5982906 0.56702937 0.59831103
0.59831103 0.53409532 0.53409532 0.53409532]
mean value: 0.5424027293731967
key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842
0.94736842 0.96491228 0.96491228 0.94736842]
mean value: 0.958227465214761
key: train_accuracy
value: [0.96131528 0.95938104 0.96131528 0.96711799 0.96518375 0.96718147
0.96718147 0.96332046 0.96332046 0.96332046]
mean value: 0.9638637670552564
key: test_fscore
value: [0.8 0.5 0.5 0. 0. 0. 0. 0.5 0.5 0. ]
mean value: 0.28
key: train_fscore
value: [0.41176471 0.36363636 0.41176471 0.54054054 0.5 0.54054054
0.54054054 0.45714286 0.45714286 0.45714286]
mean value: 0.46802159684512623
key: test_precision
value: [1. 1. 1. 0. 0. 0. 0. 1. 1. 0.]
mean value: 0.5
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.
0. 0.33333333 0.33333333 0. ]
mean value: 0.19999999999999998
key: train_recall
value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037
0.37037037 0.2962963 0.2962963 0.2962963 ]
mean value: 0.3074074074074074
key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.5
0.5 0.66666667 0.66666667 0.5 ]
mean value: 0.6
key: train_roc_auc
value: [0.62962963 0.61111111 0.62962963 0.68518519 0.66666667 0.68518519
0.68518519 0.64814815 0.64814815 0.64814815]
mean value: 0.6537037037037037
key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.
0. 0.33333333 0.33333333 0. ]
mean value: 0.19999999999999998
key: train_jcc
value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037
0.37037037 0.2962963 0.2962963 0.2962963 ]
mean value: 0.3074074074074074
MCC on Blind test: 0.12
Accuracy on Blind test: 0.52
Model_name: SVM
Model func: SVC(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01754379 0.01406407 0.01333332 0.01329112 0.01354933 0.0141921
0.01471233 0.01526451 0.01424551 0.0159626 ]
mean value: 0.014615869522094727
key: score_time
value: [0.01249647 0.00990486 0.0098834 0.00987887 0.0101161 0.00993848
0.00975871 0.01059246 0.0107305 0.0111804 ]
mean value: 0.010448026657104491
key: test_mcc
value: [ 0. 0.56713087 0.56713087 0. 0. 0.
-0.03149704 0.56694671 0. 0.56694671]
mean value: 0.22366581252414472
key: train_mcc
value: [0.53407501 0.53407501 0.56702937 0.5982906 0.65669104 0.53409532
0.62813245 0.62813245 0.53409532 0.53409532]
mean value: 0.5748711879647856
key: test_accuracy
value: [0.94827586 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842
0.92982456 0.96491228 0.94736842 0.96491228]
mean value: 0.9530248033877798
key: train_accuracy
value: [0.96324952 0.96324952 0.96518375 0.96711799 0.97098646 0.96332046
0.96911197 0.96911197 0.96332046 0.96332046]
mean value: 0.9657972562227881
key: test_fscore
value: [0. 0.5 0.5 0. 0. 0. 0. 0.5 0. 0.5]
mean value: 0.2
key: train_fscore
value: [0.45714286 0.45714286 0.5 0.54054054 0.61538462 0.45714286
0.57894737 0.57894737 0.45714286 0.45714286]
mean value: 0.5099534178481546
key: test_precision
value: [0. 1. 1. 0. 0. 0. 0. 1. 0. 1.]
mean value: 0.4
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0.33333333 0.33333333 0. 0. 0.
0. 0.33333333 0. 0.33333333]
mean value: 0.13333333333333333
key: train_recall
value: [0.2962963 0.2962963 0.33333333 0.37037037 0.44444444 0.2962963
0.40740741 0.40740741 0.2962963 0.2962963 ]
mean value: 0.34444444444444444
key: test_roc_auc
value: [0.5 0.66666667 0.66666667 0.5 0.5 0.5
0.49074074 0.66666667 0.5 0.66666667]
mean value: 0.5657407407407408
key: train_roc_auc
value: [0.64814815 0.64814815 0.66666667 0.68518519 0.72222222 0.64814815
0.7037037 0.7037037 0.64814815 0.64814815]
mean value: 0.6722222222222222
key: test_jcc
value: [0. 0.33333333 0.33333333 0. 0. 0.
0. 0.33333333 0. 0.33333333]
mean value: 0.13333333333333333
key: train_jcc
value: [0.2962963 0.2962963 0.33333333 0.37037037 0.44444444 0.2962963
0.40740741 0.40740741 0.2962963 0.2962963 ]
mean value: 0.34444444444444444
MCC on Blind test: 0.11
Accuracy on Blind test: 0.52
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.86374784 2.09029508 1.63843369 2.03394628 1.90827298 1.78310966
1.99839234 2.06304955 1.93170047 1.81367326]
mean value: 1.9124621152877808
key: score_time
value: [0.01261926 0.01259279 0.01266718 0.01298261 0.01273131 0.01226902
0.01244545 0.01506019 0.01257229 0.01240182]
mean value: 0.01283419132232666
key: test_mcc
value: [0.76038268 0.5508895 0.38251843 0. 0.2969697 0.56694671
0.68718427 0.38204659 0.80903983 0.38204659]
mean value: 0.4818024290736844
key: train_mcc
value: [1. 1. 1. 1. 1. 0.9609263
0.98030899 1. 0.9609263 1. ]
mean value: 0.990216159887024
key: test_accuracy
value: [0.96551724 0.94827586 0.94827586 0.94827586 0.93103448 0.96491228
0.94736842 0.94736842 0.98245614 0.94736842]
mean value: 0.9530852994555353
key: train_accuracy
value: [1. 1. 1. 1. 1. 0.996139 0.9980695
1. 0.996139 1. ]
mean value: 0.9990347490347491
key: test_fscore
value: [0.75 0.57142857 0.4 0. 0.33333333 0.5
0.66666667 0.4 0.8 0.4 ]
mean value: 0.48214285714285715
key: train_fscore
value: [1. 1. 1. 1. 1. 0.96296296
0.98113208 1. 0.96296296 1. ]
mean value: 0.9907058001397624
key: test_precision
value: [0.6 0.5 0.5 0. 0.33333333 1.
0.5 0.5 1. 0.5 ]
mean value: 0.5433333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 0.96296296
1. 1. 0.96296296 1. ]
mean value: 0.9925925925925926
key: test_recall
value: [1. 0.66666667 0.33333333 0. 0.33333333 0.33333333
1. 0.33333333 0.66666667 0.33333333]
mean value: 0.5
key: train_recall
value: [1. 1. 1. 1. 1. 0.96296296
0.96296296 1. 0.96296296 1. ]
mean value: 0.9888888888888889
key: test_roc_auc
value: [0.98181818 0.81515152 0.65757576 0.5 0.64848485 0.66666667
0.97222222 0.65740741 0.83333333 0.65740741]
mean value: 0.739006734006734
key: train_roc_auc
value: [1. 1. 1. 1. 1. 0.98046315
0.98148148 1. 0.98046315 1. ]
mean value: 0.9942407784566644
key: test_jcc
value: [0.6 0.4 0.25 0. 0.2 0.33333333
0.5 0.25 0.66666667 0.25 ]
mean value: 0.345
key: train_jcc
value: [1. 1. 1. 1. 1. 0.92857143
0.96296296 1. 0.92857143 1. ]
mean value: 0.982010582010582
MCC on Blind test: 0.24
Accuracy on Blind test: 0.57
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02618551 0.02589345 0.02173519 0.02467251 0.02685261 0.02534461
0.0224154 0.02488923 0.02453709 0.01868343]
mean value: 0.024120903015136717
key: score_time
value: [0.01264358 0.00964165 0.00909233 0.00906181 0.00916147 0.00893998
0.00907493 0.00904655 0.00892591 0.00914407]
mean value: 0.009473228454589843
key: test_mcc
value: [0.5508895 0.64848485 0.56713087 0.56713087 0.2969697 0.76011695
0.55039532 0.64814815 0.80903983 0.2962963 ]
mean value: 0.5694602338563187
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94827586 0.96551724 0.96551724 0.96551724 0.93103448 0.96491228
0.94736842 0.96491228 0.98245614 0.92982456]
mean value: 0.9565335753176043
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.57142857 0.66666667 0.5 0.5 0.33333333 0.75
0.57142857 0.66666667 0.8 0.33333333]
mean value: 0.5692857142857143
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.66666667 1. 1. 0.33333333 0.6
0.5 0.66666667 1. 0.33333333]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
mean value: 0.66
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 1.
0.66666667 0.66666667 0.66666667 0.33333333]
mean value: 0.5666666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.81515152 0.82424242 0.66666667 0.66666667 0.64848485 0.98148148
0.81481481 0.82407407 0.83333333 0.64814815]
mean value: 0.7723063973063973
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.4 0.5 0.33333333 0.33333333 0.2 0.6
0.4 0.5 0.66666667 0.2 ]
mean value: 0.41333333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.25
Accuracy on Blind test: 0.58
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10347772 0.10542083 0.1026392 0.10236788 0.1086688 0.10966682
0.10839295 0.10804629 0.10971022 0.10941029]
mean value: 0.10678009986877442
key: score_time
value: [0.01768446 0.01904488 0.01783371 0.01776528 0.01919889 0.01932764
0.01858878 0.0196681 0.01852608 0.01737475]
mean value: 0.01850125789642334
key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0. 0.56713087 0.56694671
0.38204659 0.38204659 0.80903983 0.64814815]
mean value: 0.5298794082235709
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.96551724 0.96491228
0.94736842 0.94736842 0.98245614 0.96491228]
mean value: 0.9634603750756201
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.5 0.5 0. 0.5 0.5
0.4 0.4 0.8 0.66666667]
mean value: 0.5066666666666667
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0. 1. 1.
0.5 0.5 1. 0.66666667]
mean value: 0.7666666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333
0.33333333 0.33333333 0.66666667 0.66666667]
mean value: 0.39999999999999997
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5 0.66666667 0.66666667
0.65740741 0.65740741 0.83333333 0.82407407]
mean value: 0.6972222222222222
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333
0.25 0.25 0.66666667 0.5 ]
mean value: 0.36666666666666664
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.54
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01089549 0.00997782 0.01006174 0.00988054 0.01028514 0.00988078
0.01003456 0.00997853 0.01007175 0.01044822]
mean value: 0.01015145778656006
key: score_time
value: [0.0088129 0.0086453 0.00865483 0.00858808 0.00892234 0.00854993
0.00863981 0.00871801 0.00876451 0.00892496]
mean value: 0.008722066879272461
key: test_mcc
value: [ 0.48301038 0.38251843 0.2969697 -0.04413674 -0.03093441 0.56694671
0.20464687 0.24282147 0. 0.17516462]
mean value: 0.22770070155826885
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93103448 0.94827586 0.93103448 0.9137931 0.93103448 0.96491228
0.89473684 0.9122807 0.94736842 0.87719298]
mean value: 0.9251663641863279
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.4 0.33333333 0. 0. 0.5
0.25 0.28571429 0. 0.22222222]
mean value: 0.24912698412698414
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.4 0.5 0.33333333 0. 0. 1.
0.2 0.25 0. 0.16666667]
mean value: 0.285
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.33333333 0.33333333 0. 0.33333333]
mean value: 0.26666666666666666
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.80606061 0.65757576 0.64848485 0.48181818 0.49090909 0.66666667
0.62962963 0.63888889 0.5 0.62037037]
mean value: 0.614040404040404
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.25 0.2 0. 0. 0.33333333
0.14285714 0.16666667 0. 0.125 ]
mean value: 0.1551190476190476
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.57
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.54438972 1.56727147 1.54567933 1.55488396 1.54870415 1.54974079
1.54354572 1.55656648 1.52714944 1.55768704]
mean value: 1.5495618104934692
key: score_time
value: [0.0920825 0.08973002 0.08945179 0.08959389 0.08978343 0.08988523
0.08968854 0.08932042 0.09021282 0.08947134]
mean value: 0.08992199897766114
key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0. 0. 0.56694671
0.38204659 0.38204659 0.56694671 0.80903983]
mean value: 0.46504617707858015
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
0.94736842 0.94736842 0.96491228 0.98245614]
mean value: 0.9617362371445856
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.5 0.5 0. 0. 0.5 0.4 0.4 0.5 0.8]
mean value: 0.44
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0. 0. 1. 0.5 0.5 1. 1. ]
mean value: 0.7
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.33333333 0.33333333 0.33333333 0.66666667]
mean value: 0.3333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.66666667
0.65740741 0.65740741 0.66666667 0.83333333]
mean value: 0.6648148148148147
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.25 0.25 0.33333333 0.66666667]
mean value: 0.31666666666666665
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.54
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.79947305 0.99364781 0.97343969 1.00802183 0.97355986 0.97136927
0.97683573 0.94764853 0.94387889 0.96559119]
mean value: 1.0553465843200684
key: score_time
value: [0.22360277 0.19479609 0.35481954 0.11889696 0.24737167 0.24671912
0.25966692 0.24125576 0.23207593 0.26727366]
mean value: 0.23864784240722656
key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0. 0. 0.56694671
0.38204659 0. 0. 0.56694671]
mean value: 0.34593753471007405
key: train_mcc
value: [0.73639347 0.65669104 0.76130255 0.71071615 0.73639347 0.68420281
0.65671091 0.68420281 0.68420281 0.76131958]
mean value: 0.7072135583787191
key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
0.94736842 0.94736842 0.94736842 0.96491228]
mean value: 0.958227465214761
key: train_accuracy
value: [0.97678917 0.97098646 0.9787234 0.97485493 0.97678917 0.97297297
0.97104247 0.97297297 0.97297297 0.97876448]
mean value: 0.9746869002188151
key: test_fscore
value: [0.8 0.5 0.5 0. 0. 0.5 0.4 0. 0. 0.5]
mean value: 0.32
key: train_fscore
value: [0.71428571 0.61538462 0.74418605 0.68292683 0.71428571 0.65
0.61538462 0.65 0.65 0.74418605]
mean value: 0.6780639581632207
key: test_precision
value: [1. 1. 1. 0. 0. 1. 0.5 0. 0. 1. ]
mean value: 0.55
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.33333333 0. 0. 0.33333333]
mean value: 0.2333333333333333
key: train_recall
value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148
0.44444444 0.48148148 0.48148148 0.59259259]
mean value: 0.5148148148148148
key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.66666667
0.65740741 0.5 0.5 0.66666667]
mean value: 0.6157407407407407
key: train_roc_auc
value: [0.77777778 0.72222222 0.7962963 0.75925926 0.77777778 0.74074074
0.72222222 0.74074074 0.74074074 0.7962963 ]
mean value: 0.7574074074074074
key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0.25 0. 0. 0.33333333]
mean value: 0.22499999999999998
key: train_jcc
value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148
0.44444444 0.48148148 0.48148148 0.59259259]
mean value: 0.5148148148148148
MCC on Blind test: 0.13
Accuracy on Blind test: 0.52
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02531791 0.01029682 0.01034307 0.01032424 0.01029944 0.01025152
0.01029849 0.01025391 0.01028061 0.01024294]
mean value: 0.011790895462036132
key: score_time
value: [0.01097441 0.008816 0.00896859 0.00880456 0.00884557 0.00885034
0.00882292 0.00883365 0.00881696 0.00878763]
mean value: 0.009052062034606933
key: test_mcc
value: [ 0.5508895 0.56713087 0.38251843 -0.04413674 -0.05454545 0.38204659
-0.03149704 0.38204659 0.76011695 0.80903983]
mean value: 0.37036095219850856
key: train_mcc
value: [0.58249009 0.5427084 0.62227177 0.55500996 0.60935992 0.62292056
0.58514128 0.58191044 0.58514128 0.5955795 ]
mean value: 0.5882533192252368
key: test_accuracy
value: [0.94827586 0.96551724 0.94827586 0.9137931 0.89655172 0.94736842
0.92982456 0.94736842 0.96491228 0.98245614]
mean value: 0.9444343617664852
key: train_accuracy
value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996
0.95752896 0.96138996 0.95752896 0.96138996]
mean value: 0.9598067257641726
key: test_fscore
value: [0.57142857 0.5 0.4 0. 0. 0.4
0. 0.4 0.75 0.8 ]
mean value: 0.3821428571428572
key: train_fscore
value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714
0.60714286 0.6 0.60714286 0.61538462]
mean value: 0.6088222284559688
key: test_precision
value: [0.5 1. 0.5 0. 0. 0.5 0. 0.5 0.6 1. ]
mean value: 0.46
key: train_precision
value: [0.61538462 0.57692308 0.65384615 0.6 0.66666667 0.62068966
0.5862069 0.65217391 0.5862069 0.64 ]
mean value: 0.6198097874139853
key: test_recall
value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333
0. 0.33333333 1. 0.66666667]
mean value: 0.36666666666666664
key: train_recall
value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667
0.62962963 0.55555556 0.62962963 0.59259259]
mean value: 0.6
key: test_roc_auc
value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741
0.49074074 0.65740741 0.98148148 0.83333333]
mean value: 0.6714309764309764
key: train_roc_auc
value: [0.78609221 0.76655329 0.80563114 0.7675737 0.78813303 0.8221317
0.80259486 0.76963114 0.80259486 0.78713133]
mean value: 0.7898067251340455
key: test_jcc
value: [0.4 0.33333333 0.25 0. 0. 0.25
0. 0.25 0.6 0.66666667]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
0.275
key: train_jcc
value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421
0.43589744 0.42857143 0.43589744 0.44444444]
mean value: 0.4380434714645241
MCC on Blind test: 0.2
Accuracy on Blind test: 0.56
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10296011 0.07609224 0.06126785 0.07784891 0.08386159 0.0641036
0.06421399 0.07174659 0.06452918 0.06336784]
mean value: 0.07299919128417968
key: score_time
value: [0.01094532 0.01224589 0.01145506 0.01107168 0.01102376 0.01049256
0.01052618 0.01079702 0.01050138 0.01053238]
mean value: 0.010959124565124512
key: test_mcc
value: [ 0.64848485 0.80917359 0.56713087 0. -0.03093441 0.56694671
0.80903983 0.64814815 1. 0.64814815]
mean value: 0.5666137744534676
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96551724 0.98275862 0.96551724 0.94827586 0.93103448 0.96491228
0.98245614 0.96491228 1. 0.96491228]
mean value: 0.9670296430732003
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.8 0.5 0. 0. 0.5
0.8 0.66666667 1. 0.66666667]
mean value: 0.56
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 1. 1. 0. 0. 1.
1. 0.66666667 1. 0.66666667]
mean value: 0.7
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0. 0. 0.33333333
0.66666667 0.66666667 1. 0.66666667]
mean value: 0.5
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.82424242 0.83333333 0.66666667 0.5 0.49090909 0.66666667
0.83333333 0.82407407 1. 0.82407407]
mean value: 0.7463299663299663
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.66666667 0.33333333 0. 0. 0.33333333
0.66666667 0.5 1. 0.5 ]
mean value: 0.45
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.58
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0533421 0.05394554 0.07271242 0.05558991 0.0737896 0.08503461
0.06888843 0.09786415 0.09248352 0.04690504]
mean value: 0.07005553245544434
key: score_time
value: [0.01226044 0.01864839 0.01218653 0.01870155 0.01248217 0.0122149
0.01872897 0.03266191 0.02536178 0.0123961 ]
mean value: 0.017564272880554198
key: test_mcc
value: [0.76038268 0.32993527 0.24366266 0.38251843 0.38251843 0.56694671
0.39056329 0.35714286 0.56694671 0.39056329]
mean value: 0.4371180313243308
key: train_mcc
value: [0.84368859 0.8348779 0.90074358 0.8609619 0.87923612 0.83872426
0.86097638 0.87924838 0.86097638 0.83872426]
mean value: 0.859815774084354
key: test_accuracy
value: [0.96551724 0.86206897 0.9137931 0.94827586 0.94827586 0.96491228
0.89473684 0.87719298 0.96491228 0.89473684]
mean value: 0.9234422262552934
key: train_accuracy
value: [0.98452611 0.98452611 0.99032882 0.98646035 0.98839458 0.98455598
0.98648649 0.98841699 0.98648649 0.98455598]
mean value: 0.9864737907291099
key: test_fscore
value: [0.75 0.33333333 0.28571429 0.4 0.4 0.5
0.4 0.36363636 0.5 0.4 ]
mean value: 0.43326839826839825
key: train_fscore
value: [0.85185185 0.84 0.90566038 0.86792453 0.88461538 0.84615385
0.86792453 0.88461538 0.86792453 0.84615385]
mean value: 0.8662824275654464
key: test_precision
value: [0.6 0.22222222 0.25 0.5 0.5 1.
0.28571429 0.25 1. 0.28571429]
mean value: 0.48936507936507934
key: train_precision
value: [0.85185185 0.91304348 0.92307692 0.88461538 0.92 0.88
0.88461538 0.92 0.88461538 0.88 ]
mean value: 0.8941818407035799
key: test_recall
value: [1. 0.66666667 0.33333333 0.33333333 0.33333333 0.33333333
0.66666667 0.66666667 0.33333333 0.66666667]
mean value: 0.5333333333333333
key: train_recall
value: [0.85185185 0.77777778 0.88888889 0.85185185 0.85185185 0.81481481
0.85185185 0.85185185 0.85185185 0.81481481]
mean value: 0.8407407407407408
key: test_roc_auc
value: [0.98181818 0.76969697 0.63939394 0.65757576 0.65757576 0.66666667
0.78703704 0.77777778 0.66666667 0.78703704]
mean value: 0.7391245791245791
key: train_roc_auc
value: [0.92184429 0.88684807 0.94240363 0.9228647 0.92388511 0.90435242
0.92287094 0.92388927 0.92287094 0.90435242]
mean value: 0.9176181778436652
key: test_jcc
value: [0.6 0.2 0.16666667 0.25 0.25 0.33333333
0.25 0.22222222 0.33333333 0.25 ]
mean value: 0.28555555555555556
key: train_jcc
value: [0.74193548 0.72413793 0.82758621 0.76666667 0.79310345 0.73333333
0.76666667 0.79310345 0.76666667 0.73333333]
mean value: 0.7646533185020393
MCC on Blind test: 0.24
Accuracy on Blind test: 0.58
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01456141 0.01192737 0.01130104 0.01136208 0.01116967 0.01007843
0.0100646 0.01123548 0.00998092 0.01085925]
mean value: 0.011254024505615235
key: score_time
value: [0.01082349 0.00998855 0.00960755 0.00958705 0.00879359 0.00871301
0.0093348 0.00888252 0.00888968 0.00979233]
mean value: 0.009441256523132324
key: test_mcc
value: [ 0.85811633 0.48301038 0.38251843 -0.03093441 0.20563808 0.64814815
0.24282147 0.2962963 0.39056329 0.55039532]
mean value: 0.40265733298043693
key: train_mcc
value: [0.472191 0.46384474 0.49044321 0.47807824 0.55306158 0.49897952
0.48527524 0.51651884 0.42130112 0.48236557]
mean value: 0.4862059055721764
key: test_accuracy
value: [0.98275862 0.93103448 0.94827586 0.93103448 0.89655172 0.96491228
0.9122807 0.92982456 0.89473684 0.94736842]
mean value: 0.9338777979431336
key: train_accuracy
value: [0.94197292 0.94003868 0.94197292 0.93423598 0.9516441 0.94401544
0.93629344 0.94401544 0.93436293 0.94015444]
mean value: 0.9408706302323324
key: test_fscore
value: [0.85714286 0.5 0.4 0. 0.25 0.66666667
0.28571429 0.33333333 0.4 0.57142857]
mean value: 0.42642857142857143
key: train_fscore
value: [0.5 0.49180328 0.51612903 0.5 0.57627119 0.52459016
0.50746269 0.53968254 0.4516129 0.50793651]
mean value: 0.5115488298733711
key: test_precision
value: [0.75 0.4 0.5 0. 0.2 0.66666667
0.25 0.33333333 0.28571429 0.5 ]
mean value: 0.38857142857142857
key: train_precision
value: [0.45454545 0.44117647 0.45714286 0.41463415 0.53125 0.47058824
0.425 0.47222222 0.4 0.44444444]
mean value: 0.4511003830578795
key: test_recall
value: [1. 0.66666667 0.33333333 0. 0.33333333 0.66666667
0.33333333 0.33333333 0.66666667 0.66666667]
mean value: 0.5
key: train_recall
value: [0.55555556 0.55555556 0.59259259 0.62962963 0.62962963 0.59259259
0.62962963 0.62962963 0.51851852 0.59259259]
mean value: 0.5925925925925926
key: test_roc_auc
value: [0.99090909 0.80606061 0.65757576 0.49090909 0.63030303 0.82407407
0.63888889 0.64814815 0.78703704 0.81481481]
mean value: 0.7288720538720539
key: train_roc_auc
value: [0.75941043 0.75839002 0.77690854 0.79032502 0.79950869 0.77796636
0.79139323 0.79546655 0.73787433 0.7759297 ]
mean value: 0.7763172863623838
key: test_jcc
value: [0.75 0.33333333 0.25 0. 0.14285714 0.5
0.16666667 0.2 0.25 0.4 ]
mean value: 0.29928571428571427
key: train_jcc
value: [0.33333333 0.32608696 0.34782609 0.33333333 0.4047619 0.35555556
0.34 0.36956522 0.29166667 0.34042553]
mean value: 0.34425545864352525
MCC on Blind test: 0.27
Accuracy on Blind test: 0.59
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01444721 0.01985002 0.02024388 0.02129292 0.02039051 0.01963663
0.01987505 0.01990676 0.02227616 0.02153397]
mean value: 0.019945311546325683
key: score_time
value: [0.01014018 0.01180124 0.01179695 0.01186323 0.01182628 0.01188421
0.01184916 0.01185966 0.01190591 0.01182818]
mean value: 0.011675500869750976
key: test_mcc
value: [0.76038268 0.56713087 0.38251843 0.43192347 0.5508895 0.56694671
0.55039532 0.38204659 0.56694671 0.48238191]
mean value: 0.5241562187130722
key: train_mcc
value: [0.87704592 0.76167305 0.98030696 0.56113172 0.89861406 0.7364114
0.83254034 0.87924838 0.87705763 0.8711254 ]
mean value: 0.8275154861185016
key: test_accuracy
value: [0.96551724 0.96551724 0.94827586 0.9137931 0.94827586 0.96491228
0.94736842 0.94736842 0.96491228 0.92982456]
mean value: 0.9495765275257109
key: train_accuracy
value: [0.98839458 0.9787234 0.99806576 0.9032882 0.99032882 0.97683398
0.98455598 0.98841699 0.98841699 0.98455598]
mean value: 0.9781580696474313
key: test_fscore
value: [0.75 0.5 0.4 0.44444444 0.57142857 0.5
0.57142857 0.4 0.5 0.5 ]
mean value: 0.5137301587301587
key: train_fscore
value: [0.88 0.75555556 0.98113208 0.51923077 0.90196078 0.71428571
0.83333333 0.88461538 0.88 0.87096774]
mean value: 0.8221081358741665
key: test_precision
value: [0.6 1. 0.5 0.33333333 0.5 1.
0.5 0.5 1. 0.4 ]
mean value: 0.6333333333333333
key: train_precision
value: [0.95652174 0.94444444 1. 0.35064935 0.95833333 1.
0.95238095 0.92 0.95652174 0.77142857]
mean value: 0.8810280130497522
key: test_recall
value: [1. 0.33333333 0.33333333 0.66666667 0.66666667 0.33333333
0.66666667 0.33333333 0.33333333 0.66666667]
mean value: 0.5333333333333333
key: train_recall
value: [0.81481481 0.62962963 0.96296296 1. 0.85185185 0.55555556
0.74074074 0.85185185 0.81481481 1. ]
mean value: 0.8222222222222222
key: test_roc_auc
value: [0.98181818 0.66666667 0.65757576 0.7969697 0.81515152 0.66666667
0.81481481 0.65740741 0.66666667 0.80555556]
mean value: 0.7529292929292929
key: train_roc_auc
value: [0.906387 0.81379441 0.98148148 0.94897959 0.92490552 0.77777778
0.86935204 0.92388927 0.90638908 0.99185336]
mean value: 0.9044809519191248
key: test_jcc
value: [0.6 0.33333333 0.25 0.28571429 0.4 0.33333333
0.4 0.25 0.33333333 0.33333333]
mean value: 0.3519047619047619
key: train_jcc
value: [0.78571429 0.60714286 0.96296296 0.35064935 0.82142857 0.55555556
0.71428571 0.79310345 0.78571429 0.77142857]
mean value: 0.7147985603158017
MCC on Blind test: 0.3
Accuracy on Blind test: 0.6
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01839423 0.01934814 0.01817775 0.01901317 0.01911378 0.01753163
0.01846671 0.0182879 0.01776218 0.01677585]
mean value: 0.018287134170532227
key: score_time
value: [0.01187444 0.01188636 0.01183009 0.011832 0.01181626 0.01186419
0.01186252 0.01177549 0.01186895 0.01181245]
mean value: 0.01184227466583252
key: test_mcc
value: [0.76038268 0.45726459 0.56713087 0.38251843 0.2969697 0.56694671
0.55039532 0.64814815 0.76011695 0.24638016]
mean value: 0.5236253563585851
key: train_mcc
value: [0.88276644 0.4252731 0.73639347 0.82784768 0.92184429 0.85459272
0.80911474 0.78890378 0.65218543 0.49165645]
mean value: 0.7390578100781762
key: test_accuracy
value: [0.96551724 0.84482759 0.96551724 0.94827586 0.93103448 0.96491228
0.94736842 0.96491228 0.96491228 0.78947368]
mean value: 0.9286751361161525
key: train_accuracy
value: [0.98839458 0.83172147 0.97678917 0.98065764 0.99226306 0.98648649
0.98262548 0.97297297 0.94208494 0.86679537]
mean value: 0.9520791169727341
key: test_fscore
value: [0.75 0.4 0.5 0.4 0.33333333 0.5
0.57142857 0.66666667 0.75 0.25 ]
mean value: 0.5121428571428571
key: train_fscore
value: [0.88888889 0.37410072 0.71428571 0.83333333 0.92592593 0.85106383
0.8 0.78787879 0.63414634 0.43902439]
mean value: 0.7248647931231662
key: test_precision
value: [0.6 0.25 1. 0.5 0.33333333 1.
0.5 0.66666667 0.6 0.15384615]
mean value: 0.5603846153846154
key: train_precision
value: [0.88888889 0.23214286 1. 0.75757576 0.92592593 1.
1. 0.66666667 0.47272727 0.28125 ]
mean value: 0.7225177368927369
key: test_recall
value: [1. 1. 0.33333333 0.33333333 0.33333333 0.33333333
0.66666667 0.66666667 1. 0.66666667]
mean value: 0.6333333333333333
key: train_recall
value: [0.88888889 0.96296296 0.55555556 0.92592593 0.92592593 0.74074074
0.66666667 0.96296296 0.96296296 1. ]
mean value: 0.8592592592592593
key: test_roc_auc
value: [0.98181818 0.91818182 0.66666667 0.65757576 0.64848485 0.66666667
0.81481481 0.82407407 0.98148148 0.73148148]
mean value: 0.7891245791245791
key: train_roc_auc
value: [0.94138322 0.89372638 0.77777778 0.9547997 0.96092215 0.87037037
0.83333333 0.96824319 0.95194991 0.92973523]
mean value: 0.9082241264915109
key: test_jcc
value: [0.6 0.25 0.33333333 0.25 0.2 0.33333333
0.4 0.5 0.6 0.14285714]
mean value: 0.36095238095238097
key: train_jcc
value: [0.8 0.2300885 0.55555556 0.71428571 0.86206897 0.74074074
0.66666667 0.65 0.46428571 0.28125 ]
mean value: 0.5964941852626854
MCC on Blind test: 0.33
Accuracy on Blind test: 0.61
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20579267 0.20070958 0.19566107 0.1949234 0.19549155 0.1920774
0.19445324 0.19372702 0.19367528 0.19807458]
mean value: 0.1964585781097412
key: score_time
value: [0.01597142 0.01675367 0.01633692 0.0162847 0.01555514 0.01612735
0.01671433 0.01590085 0.01547456 0.01589966]
mean value: 0.016101861000061037
key: test_mcc
value: [0.43192347 0.76038268 0.56713087 0. 0.56713087 0.
0.38204659 0.55039532 0.80903983 0.2962963 ]
mean value: 0.4364345939424672
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9137931 0.96551724 0.96551724 0.94827586 0.96551724 0.94736842
0.94736842 0.94736842 0.98245614 0.92982456]
mean value: 0.9513006654567453
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.44444444 0.75 0.5 0. 0.5 0.
0.4 0.57142857 0.8 0.33333333]
mean value: 0.4299206349206349
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.33333333 0.6 1. 0. 1. 0.
0.5 0.5 1. 0.33333333]
mean value: 0.5266666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 1. 0.33333333 0. 0.33333333 0.
0.33333333 0.66666667 0.66666667 0.33333333]
mean value: 0.4333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.7969697 0.98181818 0.66666667 0.5 0.66666667 0.5
0.65740741 0.81481481 0.83333333 0.64814815]
mean value: 0.7065824915824915
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.28571429 0.6 0.33333333 0. 0.33333333 0.
0.25 0.4 0.66666667 0.2 ]
mean value: 0.3069047619047619
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.26
Accuracy on Blind test: 0.58
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05620265 0.06906557 0.07550597 0.0792644 0.08082414 0.07877135
0.06620526 0.07880783 0.06887841 0.0823257 ]
mean value: 0.07358512878417969
key: score_time
value: [0.02078819 0.02542567 0.02489471 0.03681445 0.02469873 0.0268507
0.02675176 0.02348495 0.02814078 0.03065825]
mean value: 0.02685081958770752
key: test_mcc
value: [ 0.64848485 0.80917359 0.56713087 0.56713087 -0.03093441 0.80903983
0.38204659 0.80903983 0.56694671 0.38204659]
mean value: 0.5510105333468213
key: train_mcc
value: [0.93993608 0.93993608 0.98030696 0.98030696 0.89861406 0.98030899
0.96029664 0.96029664 0.94053145 0.98030899]
mean value: 0.9560842846865666
key: test_accuracy
value: [0.96551724 0.98275862 0.96551724 0.96551724 0.93103448 0.98245614
0.94736842 0.98245614 0.96491228 0.94736842]
mean value: 0.9634906231094978
key: train_accuracy
value: [0.99419729 0.99419729 0.99806576 0.99806576 0.99032882 0.9980695
0.996139 0.996139 0.99420849 0.9980695 ]
mean value: 0.9957480414927223
key: test_fscore
value: [0.66666667 0.8 0.5 0.5 0. 0.8
0.4 0.8 0.5 0.4 ]
mean value: 0.5366666666666666
key: train_fscore
value: [0.94117647 0.94117647 0.98113208 0.98113208 0.90196078 0.98113208
0.96153846 0.96153846 0.94339623 0.98113208]
mean value: 0.9575315176869006
key: test_precision
value: [0.66666667 1. 1. 1. 0. 1.
0.5 1. 1. 0.5 ]
mean value: 0.7666666666666666
key: train_precision
value: [1. 1. 1. 1. 0.95833333 1.
1. 1. 0.96153846 1. ]
mean value: 0.9919871794871795
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0. 0.66666667
0.33333333 0.66666667 0.33333333 0.33333333]
mean value: 0.4333333333333333
key: train_recall
value: [0.88888889 0.88888889 0.96296296 0.96296296 0.85185185 0.96296296
0.92592593 0.92592593 0.92592593 0.96296296]
mean value: 0.9259259259259259
key: test_roc_auc
value: [0.82424242 0.83333333 0.66666667 0.66666667 0.49090909 0.83333333
0.65740741 0.83333333 0.66666667 0.65740741]
mean value: 0.712996632996633
key: train_roc_auc
value: [0.94444444 0.94444444 0.98148148 0.98148148 0.92490552 0.98148148
0.96296296 0.96296296 0.96194463 0.98148148]
mean value: 0.9627590891527464
key: test_jcc
value: [0.5 0.66666667 0.33333333 0.33333333 0. 0.66666667
0.25 0.66666667 0.33333333 0.25 ]
mean value: 0.39999999999999997
key: train_jcc
value: [0.88888889 0.88888889 0.96296296 0.96296296 0.82142857 0.96296296
0.92592593 0.92592593 0.89285714 0.96296296]
mean value: 0.9195767195767195
MCC on Blind test: 0.22
Accuracy on Blind test: 0.56
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.1794157 0.19253588 0.17404079 0.15018892 0.12815928 0.18164134
0.12826395 0.17812467 0.17865038 0.18234944]
mean value: 0.1673370361328125
key: score_time
value: [0.02564001 0.02518463 0.01993012 0.0287869 0.02011561 0.01578426
0.02864242 0.02854109 0.02530122 0.02950859]
mean value: 0.024743485450744628
key: test_mcc
value: [0.56713087 0.56713087 0. 0. 0. 0.
0. 0. 0.56694671 0. ]
mean value: 0.17012084551450418
key: train_mcc
value: [0.93993608 0.91921394 0.96029266 0.91921394 0.93993608 0.9399419
0.91922152 0.91922152 0.91922152 0.91922152]
mean value: 0.9295420672150851
key: test_accuracy
value: [0.96551724 0.96551724 0.94827586 0.94827586 0.94827586 0.94736842
0.94736842 0.94736842 0.96491228 0.94736842]
mean value: 0.9530248033877797
key: train_accuracy
value: [0.99419729 0.99226306 0.99613153 0.99226306 0.99419729 0.99420849
0.99227799 0.99227799 0.99227799 0.99227799]
mean value: 0.9932372687691837
key: test_fscore
value: [0.5 0.5 0. 0. 0. 0. 0. 0. 0.5 0. ]
mean value: 0.15
key: train_fscore
value: [0.94117647 0.92 0.96153846 0.92 0.94117647 0.94117647
0.92 0.92 0.92 0.92 ]
mean value: 0.9305067873303168
key: test_precision
value: [1. 1. 0. 0. 0. 0. 0. 0. 1. 0.]
mean value: 0.3
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.33333333 0.33333333 0. 0. 0. 0.
0. 0. 0.33333333 0. ]
mean value: 0.09999999999999999
key: train_recall
value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889
0.85185185 0.85185185 0.85185185 0.85185185]
mean value: 0.8703703703703703
key: test_roc_auc
value: [0.66666667 0.66666667 0.5 0.5 0.5 0.5
0.5 0.5 0.66666667 0.5 ]
mean value: 0.55
key: train_roc_auc
value: [0.94444444 0.92592593 0.96296296 0.92592593 0.94444444 0.94444444
0.92592593 0.92592593 0.92592593 0.92592593]
mean value: 0.9351851851851852
key: test_jcc
value: [0.33333333 0.33333333 0. 0. 0. 0.
0. 0. 0.33333333 0. ]
mean value: 0.09999999999999999
key: train_jcc
value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889
0.85185185 0.85185185 0.85185185 0.85185185]
mean value: 0.8703703703703703
MCC on Blind test: 0.09
Accuracy on Blind test: 0.52
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.78224468 0.77179456 0.7493031 0.78262949 0.78038216 0.78873324
0.78629851 0.78719425 0.77175832 0.78670549]
mean value: 0.7787043809890747
key: score_time
value: [0.00961399 0.00944686 0.01149893 0.0093987 0.01046252 0.00997686
0.0101912 0.00940299 0.01038051 0.01013303]
mean value: 0.010050559043884277
key: test_mcc
value: [0.64848485 0.64848485 0.56713087 0.56713087 0.56713087 0.64814815
0.64814815 0.64814815 1. 0.55039532]
mean value: 0.6493202081863393
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96551724 0.96551724 0.96551724 0.96551724 0.96551724 0.96491228
0.96491228 0.96491228 1. 0.94736842]
mean value: 0.9669691470054447
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.66666667 0.5 0.5 0.5 0.66666667
0.66666667 0.66666667 1. 0.57142857]
mean value: 0.6404761904761904
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.66666667 1. 1. 1. 0.66666667
0.66666667 0.66666667 1. 0.5 ]
mean value: 0.7833333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 0.66666667
0.66666667 0.66666667 1. 0.66666667]
mean value: 0.6
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.82424242 0.82424242 0.66666667 0.66666667 0.66666667 0.82407407
0.82407407 0.82407407 1. 0.81481481]
mean value: 0.7935521885521886
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.5 0.33333333 0.33333333 0.33333333 0.5
0.5 0.5 1. 0.4 ]
mean value: 0.49
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.58
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.0282979 0.02876115 0.02958274 0.02728486 0.02812648 0.02823567
0.02803469 0.02771449 0.02794123 0.02837539]
mean value: 0.028235459327697755
key: score_time
value: [0.0126543 0.01246667 0.01491022 0.01432562 0.01413298 0.01427579
0.01446295 0.01423359 0.01441836 0.0146451 ]
mean value: 0.014052557945251464
key: test_mcc
value: [ 0.68755165 0.1762953 -0.05454545 -0.04413674 -0.05454545 0.24282147
-0.05555556 0.1134023 0.17516462 0.2962963 ]
mean value: 0.14827484228206453
key: train_mcc
value: [0.37617287 0.4990914 0.53407501 0.4990914 0.4990914 0.46163583
0.53409532 0.56704983 0.46163583 0.46163583]
mean value: 0.4893574733872949
key: test_accuracy
value: [0.94827586 0.87931034 0.89655172 0.9137931 0.89655172 0.9122807
0.89473684 0.8245614 0.87719298 0.92982456]
mean value: 0.897307924984876
key: train_accuracy
value: [0.95551257 0.96131528 0.96324952 0.96131528 0.96131528 0.95945946
0.96332046 0.96525097 0.95945946 0.95945946]
mean value: 0.9609657737317312
key: test_fscore
value: [0.66666667 0.22222222 0. 0. 0. 0.28571429
0. 0.16666667 0.22222222 0.33333333]
mean value: 0.18968253968253967
key: train_fscore
value: [0.25806452 0.41176471 0.45714286 0.41176471 0.41176471 0.36363636
0.45714286 0.5 0.36363636 0.36363636]
mean value: 0.3998553438970896
key: test_precision
value: [0.5 0.16666667 0. 0. 0. 0.25
0. 0.11111111 0.16666667 0.33333333]
mean value: 0.15277777777777776
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.33333333 0. 0. 0. 0.33333333
0. 0.33333333 0.33333333 0.33333333]
mean value: 0.26666666666666666
key: train_recall
value: [0.14814815 0.25925926 0.2962963 0.25925926 0.25925926 0.22222222
0.2962963 0.33333333 0.22222222 0.22222222]
mean value: 0.2518518518518518
key: test_roc_auc
value: [0.97272727 0.62121212 0.47272727 0.48181818 0.47272727 0.63888889
0.47222222 0.59259259 0.62037037 0.64814815]
mean value: 0.5993434343434343
key: train_roc_auc
value: [0.57407407 0.62962963 0.64814815 0.62962963 0.62962963 0.61111111
0.64814815 0.66666667 0.61111111 0.61111111]
mean value: 0.625925925925926
key: test_jcc
value: [0.5 0.125 0. 0. 0. 0.16666667
0. 0.09090909 0.125 0.2 ]
mean value: 0.12075757575757576
key: train_jcc
value: [0.14814815 0.25925926 0.2962963 0.25925926 0.25925926 0.22222222
0.2962963 0.33333333 0.22222222 0.22222222]
mean value: 0.2518518518518518
MCC on Blind test: -0.05
Accuracy on Blind test: 0.5
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02868271 0.03869915 0.03853083 0.03858232 0.03899741 0.03781533
0.04338479 0.03836989 0.03853822 0.03898597]
mean value: 0.03805866241455078
key: score_time
value: [0.01904464 0.0187161 0.01863718 0.01865602 0.0190537 0.01886559
0.01899195 0.01890612 0.01882076 0.01871467]
mean value: 0.018840670585632324
key: test_mcc
value: [1. 0.80917359 0.56713087 0.56713087 0.38251843 0.80903983
0.2962963 0.64814815 0.56694671 0.64814815]
mean value: 0.6294532902524936
key: train_mcc
value: [0.73676407 0.78589709 0.8325254 0.8321053 0.76167305 0.76169209
0.76169209 0.76169209 0.76131958 0.80951339]
mean value: 0.7804874132160566
key: test_accuracy
value: [1. 0.98275862 0.96551724 0.96551724 0.94827586 0.98245614
0.92982456 0.96491228 0.96491228 0.96491228]
mean value: 0.9669086509376891
key: train_accuracy
value: [0.97678917 0.98065764 0.98452611 0.98452611 0.9787234 0.97876448
0.97876448 0.97876448 0.97876448 0.98262548]
mean value: 0.9802905834820729
key: test_fscore
value: [1. 0.8 0.5 0.5 0.4 0.8
0.33333333 0.66666667 0.5 0.66666667]
mean value: 0.6166666666666667
key: train_fscore
value: [0.72727273 0.7826087 0.83333333 0.82608696 0.75555556 0.75555556
0.75555556 0.75555556 0.74418605 0.80851064]
mean value: 0.7744220619811697
key: test_precision
value: [1. 1. 1. 1. 0.5 1.
0.33333333 0.66666667 1. 0.66666667]
mean value: 0.8166666666666667
key: train_precision
value: [0.94117647 0.94736842 0.95238095 1. 0.94444444 0.94444444
0.94444444 0.94444444 1. 0.95 ]
mean value: 0.9568703621799597
key: test_recall
value: [1. 0.66666667 0.33333333 0.33333333 0.33333333 0.66666667
0.33333333 0.66666667 0.33333333 0.66666667]
mean value: 0.5333333333333333
key: train_recall
value: [0.59259259 0.66666667 0.74074074 0.7037037 0.62962963 0.62962963
0.62962963 0.62962963 0.59259259 0.7037037 ]
mean value: 0.6518518518518519
key: test_roc_auc
value: [1. 0.83333333 0.66666667 0.66666667 0.65757576 0.83333333
0.64814815 0.82407407 0.66666667 0.82407407]
mean value: 0.762053872053872
key: train_roc_auc
value: [0.79527589 0.83231293 0.86934996 0.85185185 0.81379441 0.81379648
0.81379648 0.81379648 0.7962963 0.85083352]
mean value: 0.8251104306850597
key: test_jcc
value: [1. 0.66666667 0.33333333 0.33333333 0.25 0.66666667
0.2 0.5 0.33333333 0.5 ]
mean value: 0.47833333333333333
key: train_jcc
value: [0.57142857 0.64285714 0.71428571 0.7037037 0.60714286 0.60714286
0.60714286 0.60714286 0.59259259 0.67857143]
mean value: 0.6332010582010582
MCC on Blind test: 0.25
Accuracy on Blind test: 0.57
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.21960998 0.27916813 0.2967515 0.17903543 0.30591917 0.16244292
0.22275519 0.29180765 0.29906225 0.32112122]
mean value: 0.2577673435211182
key: score_time
value: [0.01901317 0.01612353 0.0195241 0.01285934 0.0123415 0.01899672
0.02589607 0.02119637 0.01926637 0.02765989]
mean value: 0.019287705421447754
key: test_mcc
value: [0.80917359 0.80917359 0.56713087 0. 0.38251843 0.80903983
0.2962963 0.64814815 0.56694671 0.64814815]
mean value: 0.5536575623422023
key: train_mcc
value: [0.68418345 0.65669104 0.8325254 0.71071615 0.76167305 0.76169209
0.76169209 0.76169209 0.76131958 0.65717642]
mean value: 0.7349361348491141
key: test_accuracy
value: [0.98275862 0.98275862 0.96551724 0.94827586 0.94827586 0.98245614
0.92982456 0.96491228 0.96491228 0.96491228]
mean value: 0.9634603750756201
key: train_accuracy
value: [0.9729207 0.97098646 0.98452611 0.97485493 0.9787234 0.97876448
0.97876448 0.97876448 0.97876448 0.97104247]
mean value: 0.9768111991516247
key: test_fscore
value: [0.8 0.8 0.5 0. 0.4 0.8
0.33333333 0.66666667 0.5 0.66666667]
mean value: 0.5466666666666666
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.65 0.61538462 0.83333333 0.68292683 0.75555556 0.75555556
0.75555556 0.75555556 0.74418605 0.63414634]
mean value: 0.7182199388183507
key: test_precision
value: [1. 1. 1. 0. 0.5 1.
0.33333333 0.66666667 1. 0.66666667]
mean value: 0.7166666666666667
key: train_precision
value: [1. 1. 0.95238095 1. 0.94444444 0.94444444
0.94444444 0.94444444 1. 0.92857143]
mean value: 0.9658730158730159
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0. 0.33333333 0.66666667
0.33333333 0.66666667 0.33333333 0.66666667]
mean value: 0.4666666666666666
key: train_recall
value: [0.48148148 0.44444444 0.74074074 0.51851852 0.62962963 0.62962963
0.62962963 0.62962963 0.59259259 0.48148148]
mean value: 0.5777777777777777
key: test_roc_auc
value: [0.83333333 0.83333333 0.66666667 0.5 0.65757576 0.83333333
0.64814815 0.82407407 0.66666667 0.82407407]
mean value: 0.7287205387205387
key: train_roc_auc
value: [0.74074074 0.72222222 0.86934996 0.75925926 0.81379441 0.81379648
0.81379648 0.81379648 0.7962963 0.73972241]
mean value: 0.7882774752806758
key: test_jcc
value: [0.66666667 0.66666667 0.33333333 0. 0.25 0.66666667
0.2 0.5 0.33333333 0.5 ]
mean value: 0.4116666666666666
key: train_jcc
value: [0.48148148 0.44444444 0.71428571 0.51851852 0.60714286 0.60714286
0.60714286 0.60714286 0.59259259 0.46428571]
mean value: 0.5644179894179894
MCC on Blind test: 0.21
Accuracy on Blind test: 0.55
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04487562 0.04138589 0.04369426 0.04303861 0.04680061 0.04244494
0.03946328 0.04193759 0.04206204 0.03964663]
mean value: 0.04253494739532471
key: score_time
value: [0.01250577 0.01236224 0.01433182 0.01480269 0.02338958 0.01450467
0.01239443 0.01240659 0.01476336 0.0124073 ]
mean value: 0.014386844635009766
key: test_mcc
value: [0.88989899 0.92724828 0.98181818 0.94511785 0.96396098 0.96393713
0.94635821 0.98181211 0.96396098 0.92905936]
mean value: 0.9493172082658792
key: train_mcc
value: [0.97573744 0.96750835 0.96550335 0.96142511 0.96543927 0.96343147
0.97158737 0.96750942 0.96550464 0.97158737]
mean value: 0.9675233803087637
key: test_accuracy
value: [0.94495413 0.96330275 0.99082569 0.97247706 0.98165138 0.98165138
0.97247706 0.99082569 0.98165138 0.96330275]
mean value: 0.9743119266055046
key: train_accuracy
value: [0.98776758 0.98369011 0.98267074 0.98063201 0.98267074 0.98165138
0.98572885 0.98369011 0.98267074 0.98572885]
mean value: 0.9836901121304791
key: test_fscore
value: [0.94444444 0.96363636 0.99082569 0.97247706 0.98181818 0.98214286
0.97345133 0.99099099 0.98148148 0.96491228]
mean value: 0.974618067994328
key: train_fscore
value: [0.98790323 0.98383838 0.98284561 0.98082745 0.98281092 0.98178138
0.98582996 0.98380567 0.98281092 0.98582996]
mean value: 0.9838283470967917
key: test_precision
value: [0.94444444 0.94642857 0.98181818 0.96363636 0.96428571 0.96491228
0.94827586 0.98214286 1. 0.93220339]
mean value: 0.962814766535736
key: train_precision
value: [0.97804391 0.9759519 0.974 0.972 0.97590361 0.97389558
0.97791165 0.97590361 0.9739479 0.97791165]
mean value: 0.9755469816192518
key: test_recall
value: [0.94444444 0.98148148 1. 0.98148148 1. 1.
1. 1. 0.96363636 1. ]
mean value: 0.9871043771043772
key: train_recall
value: [0.99796334 0.99185336 0.99185336 0.9898167 0.9898167 0.98979592
0.99387755 0.99183673 0.99183673 0.99387755]
mean value: 0.992252795211771
key: test_roc_auc
value: [0.94494949 0.96346801 0.99090909 0.97255892 0.98181818 0.98148148
0.97222222 0.99074074 0.98181818 0.96296296]
mean value: 0.9742929292929293
key: train_roc_auc
value: [0.98775718 0.98368178 0.98266137 0.98062264 0.98266345 0.98165967
0.98573715 0.98369841 0.98268008 0.98573715]
mean value: 0.9836898873602394
key: test_jcc
value: [0.89473684 0.92982456 0.98181818 0.94642857 0.96428571 0.96491228
0.94827586 0.98214286 0.96363636 0.93220339]
mean value: 0.9508264624421688
key: train_jcc
value: [0.97609562 0.96819085 0.96626984 0.96237624 0.96620278 0.96421471
0.97205589 0.96812749 0.96620278 0.97205589]
mean value: 0.9681792096111226
MCC on Blind test: 0.3
Accuracy on Blind test: 0.6
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.99837112 1.1198895 1.16606784 1.38026643 1.1718266 1.01276398
1.08470702 1.11438894 1.01340914 1.17820048]
mean value: 1.1239891052246094
key: score_time
value: [0.01469374 0.01463389 0.02088499 0.01533866 0.01591563 0.01322365
0.01592922 0.01325226 0.02350879 0.01339555]
mean value: 0.016077637672424316
key: test_mcc
value: [0.90841751 0.92724828 0.98181818 0.96329966 0.96396098 0.96393713
0.92905936 0.96329966 0.94511785 0.92905936]
mean value: 0.947521797154391
key: train_mcc
value: [0.9857799 0.9837635 0.97767312 0.99796333 0.9857799 0.95334985
0.9817516 0.94927144 0.9534452 0.9817516 ]
mean value: 0.9750529452374251
key: test_accuracy
value: [0.95412844 0.96330275 0.99082569 0.98165138 0.98165138 0.98165138
0.96330275 0.98165138 0.97247706 0.96330275]
mean value: 0.9733944954128441
key: train_accuracy
value: [0.99286442 0.99184506 0.98878695 0.99898063 0.99286442 0.97655454
0.99082569 0.9745158 0.97655454 0.99082569]
mean value: 0.9874617737003059
key: test_fscore
value: [0.95412844 0.96363636 0.99082569 0.98148148 0.98181818 0.98214286
0.96491228 0.98181818 0.97247706 0.96491228]
mean value: 0.9738152819961126
key: train_fscore
value: [0.9929078 0.99190283 0.98887765 0.99898271 0.9929078 0.97679112
0.99088146 0.97477296 0.97683787 0.99088146]
mean value: 0.9875743656721899
key: test_precision
value: [0.94545455 0.94642857 0.98181818 0.98148148 0.96428571 0.96491228
0.93220339 0.98181818 0.98148148 0.93220339]
mean value: 0.9612087218130929
key: train_precision
value: [0.98790323 0.98591549 0.98192771 0.99796748 0.98790323 0.96606786
0.98390342 0.96407186 0.96421471 0.98390342]
mean value: 0.9803778408423602
key: test_recall
value: [0.96296296 0.98148148 1. 0.98148148 1. 1.
1. 0.98181818 0.96363636 1. ]
mean value: 0.9871380471380471
key: train_recall
value: [0.99796334 0.99796334 0.99592668 1. 0.99796334 0.9877551
0.99795918 0.98571429 0.98979592 0.99795918]
mean value: 0.9949000374080386
key: test_roc_auc
value: [0.95420875 0.96346801 0.99090909 0.98164983 0.98181818 0.98148148
0.96296296 0.98164983 0.97255892 0.96296296]
mean value: 0.9733670033670033
key: train_roc_auc
value: [0.99285922 0.99183881 0.98877967 0.99897959 0.99285922 0.97656594
0.99083295 0.9745272 0.97656802 0.99083295]
mean value: 0.9874643584521385
key: test_jcc
value: [0.9122807 0.92982456 0.98181818 0.96363636 0.96428571 0.96491228
0.93220339 0.96428571 0.94642857 0.93220339]
mean value: 0.9491878868975211
key: train_jcc
value: [0.98591549 0.98393574 0.978 0.99796748 0.98591549 0.95463511
0.98192771 0.9507874 0.95472441 0.98192771]
mean value: 0.9755736549753808
MCC on Blind test: 0.32
Accuracy on Blind test: 0.61
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01865888 0.01219416 0.0124011 0.01228523 0.01231241 0.01269364
0.01386428 0.0126965 0.01259089 0.01382422]
mean value: 0.013352131843566895
key: score_time
value: [0.07297492 0.00939155 0.00941801 0.00939441 0.00960684 0.01031232
0.00982857 0.00956607 0.00982213 0.00963259]
mean value: 0.015994739532470704
key: test_mcc
value: [0.85382288 0.85560229 0.84200168 0.85382288 0.80953606 0.87280881
0.83496131 0.8582788 0.98181818 0.8582788 ]
mean value: 0.8620931700334511
key: train_mcc
value: [0.8661361 0.86728222 0.86560142 0.87136867 0.87160834 0.86946083
0.87788349 0.87094841 0.8476271 0.86997051]
mean value: 0.8677887071504141
key: test_accuracy
value: [0.9266055 0.9266055 0.91743119 0.9266055 0.89908257 0.93577982
0.91743119 0.9266055 0.99082569 0.9266055 ]
mean value: 0.9293577981651376
key: train_accuracy
value: [0.93170234 0.93272171 0.93170234 0.93476045 0.93476045 0.93374108
0.93781855 0.93476045 0.92150866 0.93374108]
mean value: 0.9327217125382263
key: test_fscore
value: [0.92727273 0.92857143 0.92173913 0.92727273 0.90598291 0.9380531
0.91891892 0.93103448 0.99082569 0.93103448]
mean value: 0.9320705589389259
key: train_fscore
value: [0.93437806 0.93491124 0.93411996 0.93688363 0.93700787 0.93583416
0.93990148 0.93650794 0.92531523 0.93608653]
mean value: 0.9350946094457768
key: test_precision
value: [0.91071429 0.89655172 0.86885246 0.91071429 0.84126984 0.9137931
0.91071429 0.8852459 1. 0.8852459 ]
mean value: 0.9023101788293987
key: train_precision
value: [0.9 0.90630975 0.90304183 0.9082218 0.90666667 0.90630975
0.90857143 0.91119691 0.88170055 0.90322581]
mean value: 0.9035244492701532
key: test_recall
value: [0.94444444 0.96296296 0.98148148 0.94444444 0.98148148 0.96363636
0.92727273 0.98181818 0.98181818 0.98181818]
mean value: 0.9651178451178452
key: train_recall
value: [0.97148676 0.96537678 0.96741344 0.96741344 0.9694501 0.96734694
0.97346939 0.96326531 0.97346939 0.97142857]
mean value: 0.9690120121368303
key: test_roc_auc
value: [0.92676768 0.92693603 0.91801347 0.92676768 0.89983165 0.93552189
0.91734007 0.92609428 0.99090909 0.92609428]
mean value: 0.9294276094276094
key: train_roc_auc
value: [0.93166175 0.93268839 0.9316659 0.93472713 0.93472505 0.9337753
0.93785486 0.93478948 0.92156158 0.93377946]
mean value: 0.9327228895631572
key: test_jcc
value: [0.86440678 0.86666667 0.85483871 0.86440678 0.828125 0.88333333
0.85 0.87096774 0.98181818 0.87096774]
mean value: 0.8735530934688602
key: train_jcc
value: [0.87683824 0.87777778 0.87638376 0.8812616 0.88148148 0.87940631
0.8866171 0.88059701 0.86101083 0.87985213]
mean value: 0.8781226233231253
MCC on Blind test: 0.29
Accuracy on Blind test: 0.61
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01229978 0.01690793 0.0169034 0.01696324 0.01693296 0.01681113
0.01681638 0.01694918 0.01694226 0.01689982]
mean value: 0.01644260883331299
key: score_time
value: [0.01223588 0.01235938 0.0123477 0.01233745 0.01240754 0.01235318
0.01235676 0.01237297 0.01232409 0.01234961]
mean value: 0.012344455718994141
key: test_mcc
value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899
0.85319865 0.81649832 0.83501684 0.76161616]
mean value: 0.8160124823784499
key: train_mcc
value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391
0.81246155 0.81448891 0.81246155 0.82059712]
mean value: 0.816547573939551
key: test_accuracy
value: [0.87155963 0.9266055 0.89908257 0.90825688 0.88990826 0.94495413
0.9266055 0.90825688 0.91743119 0.88073394]
mean value: 0.9073394495412844
key: train_accuracy
value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941
0.90621814 0.90723751 0.90621814 0.91029562]
mean value: 0.908256880733945
key: test_fscore
value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455
0.92727273 0.90909091 0.91743119 0.88073394]
mean value: 0.9069639782126365
key: train_fscore
value: [0.91335372 0.90723751 0.91131498 0.9057377 0.90946502 0.90368852
0.9057377 0.90685773 0.9057377 0.91002045]
mean value: 0.9079151055700868
key: test_precision
value: [0.88461538 0.96 0.93877551 0.87931034 0.85 0.94545455
0.92727273 0.90909091 0.92592593 0.88888889]
mean value: 0.9109334236280049
key: train_precision
value: [0.91428571 0.90816327 0.9122449 0.91134021 0.91891892 0.90740741
0.90946502 0.90965092 0.90946502 0.91188525]
mean value: 0.9112826621141457
key: test_recall
value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455
0.92727273 0.90909091 0.90909091 0.87272727]
mean value: 0.9045117845117845
key: train_recall
value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9
0.90204082 0.90408163 0.90204082 0.90816327]
mean value: 0.9045858098840351
key: test_roc_auc
value: [0.87138047 0.92626263 0.8986532 0.90858586 0.89040404 0.94494949
0.92659933 0.90824916 0.91750842 0.88080808]
mean value: 0.9073400673400673
key: train_roc_auc
value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515
0.90621389 0.9072343 0.90621389 0.91029345]
mean value: 0.9082569932249885
key: test_jcc
value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172
0.86440678 0.83333333 0.84745763 0.78688525]
mean value: 0.8305051161116039
key: train_jcc
value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907
0.82771536 0.82958801 0.82771536 0.83489681]
mean value: 0.8313720083087689
MCC on Blind test: 0.24
Accuracy on Blind test: 0.6
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.0153439 0.01146626 0.01095605 0.01241088 0.01217341 0.01237679
0.01235223 0.01266718 0.01223803 0.01281548]
mean value: 0.012480020523071289
key: score_time
value: [0.03982353 0.01768589 0.01724744 0.01506829 0.01472926 0.01545072
0.01538634 0.015414 0.01499319 0.01525664]
mean value: 0.018105530738830568
key: test_mcc
value: [0.81882759 0.92659933 0.946411 0.96329966 0.946411 0.91202529
0.92719967 0.91202529 0.98181211 0.98181211]
mean value: 0.931642304301634
key: train_mcc
value: [0.95763309 0.95973466 0.95145652 0.95951143 0.95355358 0.95553835
0.96550464 0.96550464 0.94927144 0.94763304]
mean value: 0.9565341373626831
key: test_accuracy
value: [0.90825688 0.96330275 0.97247706 0.98165138 0.97247706 0.95412844
0.96330275 0.95412844 0.99082569 0.99082569]
mean value: 0.9651376146788991
key: train_accuracy
value: [0.97859327 0.97961264 0.97553517 0.97961264 0.97655454 0.9775739
0.98267074 0.98267074 0.9745158 0.97349643]
mean value: 0.9780835881753314
key: test_fscore
value: [0.91071429 0.96296296 0.97297297 0.98148148 0.97297297 0.95652174
0.96428571 0.95652174 0.99099099 0.99099099]
mean value: 0.9660415850633242
key: train_fscore
value: [0.97893681 0.97995992 0.97590361 0.97987928 0.97693079 0.9778672
0.98281092 0.98281092 0.97477296 0.9739479 ]
mean value: 0.9783820308622914
key: test_precision
value: [0.87931034 0.96296296 0.94736842 0.98148148 0.94736842 0.91666667
0.94736842 0.91666667 0.98214286 0.98214286]
mean value: 0.9463479100048973
key: train_precision
value: [0.96442688 0.96449704 0.96237624 0.96819085 0.96245059 0.96428571
0.9739479 0.9739479 0.96407186 0.95669291]
mean value: 0.9654887879812519
key: test_recall
value: [0.94444444 0.96296296 1. 0.98148148 1. 1.
0.98181818 1. 1. 1. ]
mean value: 0.9870707070707071
key: train_recall
value: [0.99389002 0.99592668 0.9898167 0.99185336 0.99185336 0.99183673
0.99183673 0.99183673 0.98571429 0.99183673]
mean value: 0.9916401346689389
key: test_roc_auc
value: [0.90858586 0.96329966 0.97272727 0.98164983 0.97272727 0.9537037
0.96313131 0.9537037 0.99074074 0.99074074]
mean value: 0.9651010101010101
key: train_roc_auc
value: [0.97857766 0.97959599 0.9755206 0.97960015 0.97653893 0.97758843
0.98268008 0.98268008 0.9745272 0.97351511]
mean value: 0.97808242237832
key: test_jcc
value: [0.83606557 0.92857143 0.94736842 0.96363636 0.94736842 0.91666667
0.93103448 0.91666667 0.98214286 0.98214286]
mean value: 0.9351663738461216
key: train_jcc
value: [0.95874263 0.96070727 0.95294118 0.96055227 0.95490196 0.95669291
0.96620278 0.96620278 0.9507874 0.94921875]
mean value: 0.9576949938828678
MCC on Blind test: 0.26
Accuracy on Blind test: 0.59
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03312922 0.02710271 0.02765346 0.02737236 0.02725911 0.02756953
0.0275867 0.02737093 0.02711725 0.02727389]
mean value: 0.027943515777587892
key: score_time
value: [0.01420808 0.01313591 0.01342154 0.01315141 0.01308537 0.01318836
0.01315355 0.01331091 0.0128572 0.01323676]
mean value: 0.013274908065795898
key: test_mcc
value: [0.90841751 0.94509941 1. 0.96329966 0.96396098 0.98181211
0.92905936 0.98181211 0.98181818 0.94635821]
mean value: 0.9601637548953743
key: train_mcc
value: [0.97968567 0.97560783 0.96946898 0.96946898 0.97556738 0.97351478
0.97351478 0.97149021 0.96946962 0.97560844]
mean value: 0.9733396664150108
key: test_accuracy
value: [0.95412844 0.97247706 1. 0.98165138 0.98165138 0.99082569
0.96330275 0.99082569 0.99082569 0.97247706]
mean value: 0.9798165137614679
key: train_accuracy
value: [0.98980632 0.98776758 0.98470948 0.98470948 0.98776758 0.98674822
0.98674822 0.98572885 0.98470948 0.98776758]
mean value: 0.9866462793068298
key: test_fscore
value: [0.95412844 0.97196262 1. 0.98148148 0.98181818 0.99099099
0.96491228 0.99099099 0.99082569 0.97345133]
mean value: 0.9800561998679824
key: train_fscore
value: [0.98987854 0.98785425 0.98480243 0.98480243 0.98782961 0.98677518
0.98677518 0.98577236 0.98477157 0.98782961]
mean value: 0.9867091173333614
key: test_precision
value: [0.94545455 0.98113208 1. 0.98148148 0.96428571 0.98214286
0.93220339 0.98214286 1. 0.94827586]
mean value: 0.9717118782878628
key: train_precision
value: [0.98390342 0.98189135 0.97983871 0.97983871 0.98383838 0.98377282
0.98377282 0.98178138 0.97979798 0.98185484]
mean value: 0.9820290405776002
key: test_recall
value: [0.96296296 0.96296296 1. 0.98148148 1. 1.
1. 1. 0.98181818 1. ]
mean value: 0.9889225589225589
key: train_recall
value: [0.99592668 0.99389002 0.9898167 0.9898167 0.99185336 0.98979592
0.98979592 0.98979592 0.98979592 0.99387755]
mean value: 0.9914364686811588
key: test_roc_auc
value: [0.95420875 0.97239057 1. 0.98164983 0.98181818 0.99074074
0.96296296 0.99074074 0.99090909 0.97222222]
mean value: 0.9797643097643097
key: train_roc_auc
value: [0.98980007 0.98776134 0.98470427 0.98470427 0.98776341 0.98675132
0.98675132 0.98573299 0.98471466 0.98777381]
mean value: 0.9866457458747246
key: test_jcc
value: [0.9122807 0.94545455 1. 0.96363636 0.96428571 0.98214286
0.93220339 0.98214286 0.98181818 0.94827586]
mean value: 0.9612240473134379
key: train_jcc
value: [0.97995992 0.976 0.97005988 0.97005988 0.9759519 0.97389558
0.97389558 0.97194389 0.97 0.9759519 ]
mean value: 0.9737718540368138
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.23760557 2.39744544 2.93186569 2.52583194 2.58781528 2.25252485
2.65761113 2.15109515 1.78711081 2.11473489]
mean value: 2.3643640756607054
key: score_time
value: [0.01260877 0.01268816 0.01267433 0.01268578 0.01266146 0.01271725
0.01503229 0.0170579 0.01278281 0.01270413]
mean value: 0.013361287117004395
key: test_mcc
value: [0.94511785 0.92724828 0.98181211 0.96329966 0.92724828 0.98181211
0.91202529 0.98181211 0.96396098 0.96393713]
mean value: 0.954827381792807
key: train_mcc
value: [1. 1. 1. 1. 1. 0.99796334
1. 0.99796334 0.99592252 0.99796334]
mean value: 0.9989812544162268
key: test_accuracy
value: [0.97247706 0.96330275 0.99082569 0.98165138 0.96330275 0.99082569
0.95412844 0.99082569 0.98165138 0.98165138]
mean value: 0.9770642201834863
key: train_accuracy
value: [1. 1. 1. 1. 1. 0.99898063
1. 0.99898063 0.99796126 0.99898063]
mean value: 0.9994903160040775
key: test_fscore
value: [0.97247706 0.96363636 0.99065421 0.98148148 0.96363636 0.99099099
0.95652174 0.99099099 0.98148148 0.98214286]
mean value: 0.9774013538318624
key: train_fscore
value: [1. 1. 1. 1. 1. 0.99898063
1. 0.99898063 0.99795918 0.99898063]
mean value: 0.9994901079697934
key: test_precision
value: [0.96363636 0.94642857 1. 0.98148148 0.94642857 0.98214286
0.91666667 0.98214286 1. 0.96491228]
mean value: 0.9683839649629123
key: train_precision
value: [1. 1. 1. 1. 1. 0.99796334
1. 0.99796334 0.99795918 0.99796334]
mean value: 0.9991849204040069
key: test_recall
value: [0.98148148 0.98148148 0.98148148 0.98148148 0.98148148 1.
1. 1. 0.96363636 1. ]
mean value: 0.9871043771043772
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99795918 1. ]
mean value: 0.9997959183673469
key: test_roc_auc
value: [0.97255892 0.96346801 0.99074074 0.98164983 0.96346801 0.99074074
0.9537037 0.99074074 0.98181818 0.98148148]
mean value: 0.977037037037037
key: train_roc_auc
value: [1. 1. 1. 1. 1. 0.99898167
1. 0.99898167 0.99796126 0.99898167]
mean value: 0.9994906272081133
key: test_jcc
value: [0.94642857 0.92982456 0.98148148 0.96363636 0.92982456 0.98214286
0.91666667 0.98214286 0.96363636 0.96491228]
mean value: 0.9560696564643933
key: train_jcc
value: [1. 1. 1. 1. 1. 0.99796334
1. 0.99796334 0.99592668 0.99796334]
mean value: 0.9989816700610998
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.07054901 0.06181669 0.06172919 0.05932713 0.06351018 0.06178021
0.0615108 0.0635767 0.0611248 0.06372452]
mean value: 0.06286492347717285
key: score_time
value: [0.00950432 0.00932312 0.00905943 0.00901723 0.00903702 0.00915956
0.00907493 0.00908566 0.0090878 0.00900388]
mean value: 0.009135293960571288
key: test_mcc
value: [0.94509941 1. 0.96393713 0.98181211 0.9291517 0.94511785
0.88989899 0.96393713 0.96393713 0.92905936]
mean value: 0.9511950806790457
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97247706 1. 0.98165138 0.99082569 0.96330275 0.97247706
0.94495413 0.98165138 0.98165138 0.96330275]
mean value: 0.9752293577981651
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97196262 1. 0.98113208 0.99065421 0.96428571 0.97247706
0.94545455 0.98214286 0.98214286 0.96491228]
mean value: 0.9755164216849517
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.98113208 1. 1. 1. 0.93103448 0.98148148
0.94545455 0.96491228 0.96491228 0.93220339]
mean value: 0.9701130536400363
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.96296296 0.98148148 1. 0.96363636
0.94545455 1. 1. 1. ]
mean value: 0.9816498316498317
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97239057 1. 0.98148148 0.99074074 0.96363636 0.97255892
0.94494949 0.98148148 0.98148148 0.96296296]
mean value: 0.9751683501683501
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94545455 1. 0.96296296 0.98148148 0.93103448 0.94642857
0.89655172 0.96491228 0.96491228 0.93220339]
mean value: 0.952594171945813
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.56
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13620734 0.13592768 0.14145327 0.13965559 0.13692927 0.13765979
0.13613105 0.13678074 0.13775492 0.13754439]
mean value: 0.13760440349578856
key: score_time
value: [0.01817727 0.01823282 0.01932836 0.01829553 0.01836991 0.01836514
0.01826692 0.01823115 0.01880527 0.01826 ]
mean value: 0.018433237075805665
key: test_mcc
value: [0.90841751 0.94509941 1. 0.98181211 0.98181818 1.
0.94509941 0.96329966 1. 0.96393713]
mean value: 0.9689483423563136
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95412844 0.97247706 1. 0.99082569 0.99082569 1.
0.97247706 0.98165138 1. 0.98165138]
mean value: 0.9844036697247707
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95412844 0.97196262 1. 0.99065421 0.99082569 1.
0.97297297 0.98181818 1. 0.98214286]
mean value: 0.9844504962804286
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94545455 0.98113208 1. 1. 0.98181818 1.
0.96428571 0.98181818 1. 0.96491228]
mean value: 0.9819420979550075
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 0.96296296 1. 0.98148148 1. 1.
0.98181818 0.98181818 1. 1. ]
mean value: 0.987104377104377
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95420875 0.97239057 1. 0.99074074 0.99090909 1.
0.97239057 0.98164983 1. 0.98148148]
mean value: 0.9843771043771044
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9122807 0.94545455 1. 0.98148148 0.98181818 1.
0.94736842 0.96428571 1. 0.96491228]
mean value: 0.9697601326548695
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.56
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01190448 0.01207089 0.01192713 0.0119226 0.0119257 0.01208067
0.01233387 0.01218677 0.01201677 0.01214457]
mean value: 0.01205134391784668
key: score_time
value: [0.00913978 0.00910997 0.00899887 0.00900722 0.00899005 0.00908947
0.00914955 0.00911641 0.00894523 0.00904179]
mean value: 0.009058833122253418
key: test_mcc
value: [0.87171717 0.90838671 0.94511785 0.92724828 0.90841751 0.87167401
0.90838671 0.90841751 0.87171717 0.92719967]
mean value: 0.9048282595925088
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93577982 0.95412844 0.97247706 0.96330275 0.95412844 0.93577982
0.95412844 0.95412844 0.93577982 0.96330275]
mean value: 0.9522935779816514
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93577982 0.95327103 0.97247706 0.96363636 0.95412844 0.93693694
0.95495495 0.95412844 0.93577982 0.96428571]
mean value: 0.9525378575833003
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92727273 0.96226415 0.96363636 0.94642857 0.94545455 0.92857143
0.94642857 0.96296296 0.94444444 0.94736842]
mean value: 0.9474832187195643
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94444444 0.94444444 0.98148148 0.98148148 0.96296296 0.94545455
0.96363636 0.94545455 0.92727273 0.98181818]
mean value: 0.9578451178451178
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93585859 0.9540404 0.97255892 0.96346801 0.95420875 0.93569024
0.9540404 0.95420875 0.93585859 0.96313131]
mean value: 0.9523063973063972
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.87931034 0.91071429 0.94642857 0.92982456 0.9122807 0.88135593
0.9137931 0.9122807 0.87931034 0.93103448]
mean value: 0.9096333030120597
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.6
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [3.192981 3.21978426 3.26361322 3.27795982 3.19749761 3.2129035
3.14740181 3.20146179 3.23462439 3.21413517]
mean value: 3.2162362575531005
key: score_time
value: [0.09524465 0.09526968 0.09523535 0.09465885 0.09512901 0.09531832
0.09523821 0.09471774 0.09536004 0.09545851]
mean value: 0.09516303539276123
key: test_mcc
value: [0.96329966 1. 0.98181211 0.98181211 0.98181818 1.
0.94511785 0.98181211 1. 0.96393713]
mean value: 0.9799609160184436
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98165138 1. 0.99082569 0.99082569 0.99082569 1.
0.97247706 0.99082569 1. 0.98165138]
mean value: 0.989908256880734
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98148148 1. 0.99065421 0.99065421 0.99082569 1.
0.97247706 0.99099099 1. 0.98214286]
mean value: 0.989922649312386
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.98148148 1. 1. 1. 0.98181818 1.
0.98148148 0.98214286 1. 0.96491228]
mean value: 0.9891836282625757
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98148148 1. 0.98148148 0.98148148 1. 1.
0.96363636 1. 1. 1. ]
mean value: 0.9908080808080808
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98164983 1. 0.99074074 0.99074074 0.99090909 1.
0.97255892 0.99074074 1. 0.98148148]
mean value: 0.9898821548821548
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96363636 1. 0.98148148 0.98148148 0.98181818 1.
0.94642857 0.98214286 1. 0.96491228]
mean value: 0.9801901217690692
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.25
Accuracy on Blind test: 0.57
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.15664482 1.25536323 1.23685193 1.19847178 1.22918534 1.21921778
1.28991628 1.25609398 1.18945456 1.21153784]
mean value: 1.2242737531661987
key: score_time
value: [0.28573251 0.21750188 0.26001978 0.21304607 0.22797418 0.21436405
0.26951408 0.28902078 0.24877381 0.24985695]
mean value: 0.2475804090499878
key: test_mcc
value: [0.94511785 1. 0.98181211 0.98181211 0.92724828 0.98181211
0.90838671 0.98181211 0.96393713 0.96393713]
mean value: 0.9635875555795308
key: train_mcc
value: [0.98980835 0.98573085 0.98573085 0.98777573 0.98776757 0.98573091
0.98777583 0.98980835 0.98573091 0.98776757]
mean value: 0.9873626926348411
key: test_accuracy
value: [0.97247706 1. 0.99082569 0.99082569 0.96330275 0.99082569
0.95412844 0.99082569 0.98165138 0.98165138]
mean value: 0.981651376146789
key: train_accuracy
value: [0.99490316 0.99286442 0.99286442 0.99388379 0.99388379 0.99286442
0.99388379 0.99490316 0.99286442 0.99388379]
mean value: 0.9936799184505607
key: test_fscore
value: [0.97247706 1. 0.99065421 0.99065421 0.96363636 0.99099099
0.95495495 0.99099099 0.98214286 0.98214286]
mean value: 0.9818644490294152
key: train_fscore
value: [0.99491353 0.99287894 0.99287894 0.99390244 0.99389002 0.99286442
0.99389002 0.99489275 0.99286442 0.99387755]
mean value: 0.993685304063256
key: test_precision
value: [0.96363636 1. 1. 1. 0.94642857 0.98214286
0.94642857 0.98214286 0.96491228 0.96491228]
mean value: 0.975060378218273
key: train_precision
value: [0.99390244 0.99186992 0.99186992 0.99188641 0.99389002 0.99185336
0.99186992 0.99591002 0.99185336 0.99387755]
mean value: 0.992878291767276
key: test_recall
value: [0.98148148 1. 0.98148148 0.98148148 0.98148148 1.
0.96363636 1. 1. 1. ]
mean value: 0.9889562289562289
key: train_recall
value: [0.99592668 0.99389002 0.99389002 0.99592668 0.99389002 0.99387755
0.99591837 0.99387755 0.99387755 0.99387755]
mean value: 0.9944951993017166
key: test_roc_auc
value: [0.97255892 1. 0.99074074 0.99074074 0.96346801 0.99074074
0.9540404 0.99074074 0.98148148 0.98148148]
mean value: 0.9815993265993266
key: train_roc_auc
value: [0.99490212 0.99286338 0.99286338 0.99388171 0.99388379 0.99286546
0.99388586 0.99490212 0.99286546 0.99388379]
mean value: 0.9936797040608504
key: test_jcc
value: [0.94642857 1. 0.98148148 0.98148148 0.92982456 0.98214286
0.9137931 0.98214286 0.96491228 0.96491228]
mean value: 0.9647119474932542
key: train_jcc
value: [0.98987854 0.98585859 0.98585859 0.98787879 0.98785425 0.98582996
0.98785425 0.9898374 0.98582996 0.98782961]
mean value: 0.9874509936137159
MCC on Blind test: 0.3
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02857256 0.01666284 0.01697707 0.01707339 0.01707959 0.01700878
0.01709127 0.01708722 0.01719165 0.01719737]
mean value: 0.018194174766540526
key: score_time
value: [0.01237464 0.01241755 0.01254058 0.01241088 0.01241207 0.01243305
0.01244402 0.01243639 0.01238298 0.01241779]
mean value: 0.012426996231079101
key: test_mcc
value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899
0.85319865 0.81649832 0.83501684 0.76161616]
mean value: 0.8160124823784499
key: train_mcc
value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391
0.81246155 0.81448891 0.81246155 0.82059712]
mean value: 0.816547573939551
key: test_accuracy
value: [0.87155963 0.9266055 0.89908257 0.90825688 0.88990826 0.94495413
0.9266055 0.90825688 0.91743119 0.88073394]
mean value: 0.9073394495412844
key: train_accuracy
value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941
0.90621814 0.90723751 0.90621814 0.91029562]
mean value: 0.908256880733945
key: test_fscore
value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455
0.92727273 0.90909091 0.91743119 0.88073394]
mean value: 0.9069639782126365
key: train_fscore
value: [0.91335372 0.90723751 0.91131498 0.9057377 0.90946502 0.90368852
0.9057377 0.90685773 0.9057377 0.91002045]
mean value: 0.9079151055700868
key: test_precision
value: [0.88461538 0.96 0.93877551 0.87931034 0.85 0.94545455
0.92727273 0.90909091 0.92592593 0.88888889]
mean value: 0.9109334236280049
key: train_precision
value: [0.91428571 0.90816327 0.9122449 0.91134021 0.91891892 0.90740741
0.90946502 0.90965092 0.90946502 0.91188525]
mean value: 0.9112826621141457
key: test_recall
value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455
0.92727273 0.90909091 0.90909091 0.87272727]
mean value: 0.9045117845117845
key: train_recall
value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9
0.90204082 0.90408163 0.90204082 0.90816327]
mean value: 0.9045858098840351
key: test_roc_auc
value: [0.87138047 0.92626263 0.8986532 0.90858586 0.89040404 0.94494949
0.92659933 0.90824916 0.91750842 0.88080808]
mean value: 0.9073400673400673
key: train_roc_auc
value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515
0.90621389 0.9072343 0.90621389 0.91029345]
mean value: 0.9082569932249885
key: test_jcc
value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172
0.86440678 0.83333333 0.84745763 0.78688525]
mean value: 0.8305051161116039
key: train_jcc
value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907
0.82771536 0.82958801 0.82771536 0.83489681]
mean value: 0.8313720083087689
MCC on Blind test: 0.24
Accuracy on Blind test: 0.6
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.13928676 0.13060784 0.21524143 0.14963627 0.15037847 0.1270678
0.13197303 0.13237286 0.12784719 0.12808561]
mean value: 0.1432497262954712
key: score_time
value: [0.01138091 0.01127362 0.01427794 0.01234388 0.01148152 0.01167202
0.01126146 0.01150727 0.01140141 0.011204 ]
mean value: 0.011780405044555664
key: test_mcc
value: [0.90841751 1. 0.96393713 0.98181211 0.96396098 0.96329966
0.94511785 0.96329966 0.98181211 0.96393713]
mean value: 0.9635594149827394
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95412844 1. 0.98165138 0.99082569 0.98165138 0.98165138
0.97247706 0.98165138 0.99082569 0.98165138]
mean value: 0.981651376146789
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95412844 1. 0.98113208 0.99065421 0.98181818 0.98181818
0.97247706 0.98181818 0.99099099 0.98214286]
mean value: 0.9816980179254724
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94545455 1. 1. 1. 0.96428571 0.98181818
0.98148148 0.98181818 0.98214286 0.96491228]
mean value: 0.9801913242702717
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.96296296 0.98148148 1. 0.98181818
0.96363636 0.98181818 1. 1. ]
mean value: 0.9834680134680135
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95420875 1. 0.98148148 0.99074074 0.98181818 0.98164983
0.97255892 0.98164983 0.99074074 0.98148148]
mean value: 0.9816329966329966
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9122807 1. 0.96296296 0.98148148 0.96428571 0.96428571
0.94642857 0.96428571 0.98214286 0.96491228]
mean value: 0.9643065998329157
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.59
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05295801 0.08014059 0.09963298 0.083951 0.06942058 0.08701682
0.0714407 0.08109713 0.0660603 0.09231591]
mean value: 0.0784034013748169
key: score_time
value: [0.01235604 0.01253247 0.03159952 0.02604461 0.01255369 0.02007866
0.01244569 0.01251531 0.0123868 0.02083731]
mean value: 0.017335009574890137
key: test_mcc
value: [0.946411 0.89544301 0.946411 0.98181818 0.98181818 0.96393713
0.89223482 0.846254 0.96393713 0.91202529]
mean value: 0.9330289738014781
key: train_mcc
value: [0.9837635 0.97773807 0.9857799 0.97974261 0.9837635 0.9857802
0.97981668 0.9817516 0.9837639 0.9817516 ]
mean value: 0.9823651560326023
key: test_accuracy
value: [0.97247706 0.94495413 0.97247706 0.99082569 0.99082569 0.98165138
0.94495413 0.91743119 0.98165138 0.95412844]
mean value: 0.9651376146788991
key: train_accuracy
value: [0.99184506 0.98878695 0.99286442 0.98980632 0.99184506 0.99286442
0.98980632 0.99082569 0.99184506 0.99082569]
mean value: 0.9911314984709481
key: test_fscore
value: [0.97297297 0.94736842 0.97297297 0.99082569 0.99082569 0.98214286
0.94736842 0.92436975 0.98214286 0.95652174]
mean value: 0.9667511365513307
key: train_fscore
value: [0.99190283 0.9889001 0.9929078 0.98989899 0.99190283 0.9928934
0.98989899 0.99088146 0.99188641 0.99088146]
mean value: 0.9911954278825454
key: test_precision
value: [0.94736842 0.9 0.94736842 0.98181818 0.98181818 0.96491228
0.91525424 0.859375 0.96491228 0.91666667]
mean value: 0.9379493671099938
key: train_precision
value: [0.98591549 0.98 0.98790323 0.98196393 0.98591549 0.98787879
0.98 0.98390342 0.9858871 0.98390342]
mean value: 0.9843270865276915
key: test_recall
value: [1. 1. 1. 1. 1. 1.
0.98181818 1. 1. 1. ]
mean value: 0.9981818181818182
key: train_recall
value: [0.99796334 0.99796334 0.99796334 0.99796334 0.99796334 0.99795918
1. 0.99795918 0.99795918 0.99795918]
mean value: 0.9981653435304876
key: test_roc_auc
value: [0.97272727 0.94545455 0.97272727 0.99090909 0.99090909 0.98148148
0.94461279 0.91666667 0.98148148 0.9537037 ]
mean value: 0.9650673400673401
key: train_roc_auc
value: [0.99183881 0.98877759 0.99285922 0.989798 0.99183881 0.99286961
0.9898167 0.99083295 0.99185128 0.99083295]
mean value: 0.9911315931667983
key: test_jcc
value: [0.94736842 0.9 0.94736842 0.98181818 0.98181818 0.96491228
0.9 0.859375 0.96491228 0.91666667]
mean value: 0.9364239433811802
key: train_jcc
value: [0.98393574 0.97804391 0.98591549 0.98 0.98393574 0.9858871
0.98 0.98192771 0.98390342 0.98192771]
mean value: 0.9825476830061249
MCC on Blind test: 0.21
Accuracy on Blind test: 0.57
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02079773 0.01609492 0.01598954 0.01628113 0.0189333 0.01634192
0.01601601 0.01622939 0.01595616 0.01598763]
mean value: 0.016862773895263673
key: score_time
value: [0.01259875 0.01211143 0.01218486 0.0121243 0.01218343 0.01213431
0.01215887 0.01219344 0.01213384 0.01207805]
mean value: 0.012190127372741699
key: test_mcc
value: [0.76248469 0.78039748 0.83496131 0.76161616 0.71100747 0.79824861
0.74368478 0.78039748 0.79824861 0.78039748]
mean value: 0.7751444102497523
key: train_mcc
value: [0.77984039 0.76966498 0.76962372 0.77777963 0.79002606 0.77574619
0.77370215 0.76962372 0.77373792 0.77371595]
mean value: 0.7753460692713874
key: test_accuracy
value: [0.88073394 0.88990826 0.91743119 0.88073394 0.85321101 0.89908257
0.87155963 0.88990826 0.89908257 0.88990826]
mean value: 0.8871559633027524
key: train_accuracy
value: [0.88990826 0.88481142 0.88481142 0.88888889 0.8950051 0.88786952
0.88685015 0.88481142 0.88685015 0.88685015]
mean value: 0.8876656472986748
key: test_fscore
value: [0.87619048 0.89090909 0.91588785 0.88073394 0.85964912 0.9009009
0.87037037 0.88888889 0.9009009 0.88888889]
mean value: 0.8873320435277953
key: train_fscore
value: [0.89046653 0.88433982 0.88504578 0.88888889 0.8947906 0.88798371
0.88685015 0.8845761 0.88615385 0.88708037]
mean value: 0.8876175787042375
key: test_precision
value: [0.90196078 0.875 0.9245283 0.87272727 0.81666667 0.89285714
0.88679245 0.90566038 0.89285714 0.90566038]
mean value: 0.8874710518855913
key: train_precision
value: [0.88686869 0.88888889 0.88414634 0.88979592 0.89754098 0.88617886
0.88594705 0.88548057 0.89072165 0.88438134]
mean value: 0.8879950288650756
key: test_recall
value: [0.85185185 0.90740741 0.90740741 0.88888889 0.90740741 0.90909091
0.85454545 0.87272727 0.90909091 0.87272727]
mean value: 0.8881144781144781
key: train_recall
value: [0.89409369 0.87983707 0.88594705 0.88798371 0.89205703 0.88979592
0.8877551 0.88367347 0.88163265 0.88979592]
mean value: 0.8872571594829378
key: test_roc_auc
value: [0.88047138 0.89006734 0.91734007 0.88080808 0.8537037 0.8989899
0.87171717 0.89006734 0.8989899 0.89006734]
mean value: 0.8872222222222222
key: train_roc_auc
value: [0.88990399 0.88481649 0.88481026 0.88888981 0.89500811 0.88787148
0.88685107 0.88481026 0.88684484 0.88685315]
mean value: 0.8876659462155534
key: test_jcc
value: [0.77966102 0.80327869 0.84482759 0.78688525 0.75384615 0.81967213
0.7704918 0.8 0.81967213 0.8 ]
mean value: 0.7978334757002203
key: train_jcc
value: [0.80255941 0.79266055 0.79379562 0.8 0.80961183 0.7985348
0.7967033 0.79304029 0.79558011 0.79707495]
mean value: 0.7979560868903866
MCC on Blind test: 0.37
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03137422 0.03929567 0.02904129 0.03385782 0.03152227 0.0290134
0.02575612 0.02675271 0.035393 0.02952981]
mean value: 0.03115363121032715
key: score_time
value: [0.01219153 0.01214862 0.01207137 0.01212645 0.01213336 0.01214814
0.01212502 0.01226616 0.0121429 0.01237202]
mean value: 0.012172555923461914
key: test_mcc
value: [0.90841751 0.92724828 0.96393713 0.96393713 0.946411 0.96393713
0.92905936 0.98181211 1. 0.96393713]
mean value: 0.9548696778106656
key: train_mcc
value: [0.99390234 0.9778193 0.9837639 0.9406024 0.99187796 0.98166983
0.98170255 0.98980839 0.95806676 0.9837635 ]
mean value: 0.978297693597624
key: test_accuracy
value: [0.95412844 0.96330275 0.98165138 0.98165138 0.97247706 0.98165138
0.96330275 0.99082569 1. 0.98165138]
mean value: 0.9770642201834863
key: train_accuracy
value: [0.9969419 0.98878695 0.99184506 0.96941896 0.99592253 0.99082569
0.99082569 0.99490316 0.97859327 0.99184506]
mean value: 0.9889908256880734
key: test_fscore
value: [0.95412844 0.96363636 0.98113208 0.98113208 0.97297297 0.98214286
0.96491228 0.99099099 1. 0.98214286]
mean value: 0.9773190913898165
key: train_fscore
value: [0.99695431 0.98892246 0.99180328 0.96848739 0.9959432 0.99084435
0.99086294 0.99490316 0.97902098 0.99178645]
mean value: 0.9889528535316982
key: test_precision
value: [0.94545455 0.94642857 1. 1. 0.94736842 0.96491228
0.93220339 0.98214286 1. 0.96491228]
mean value: 0.9683422346312622
key: train_precision
value: [0.99392713 0.97808765 0.99793814 1. 0.99191919 0.98782961
0.98585859 0.99389002 0.95890411 0.99793388]
mean value: 0.9886288325873761
key: test_recall
value: [0.96296296 0.98148148 0.96296296 0.96296296 1. 1.
1. 1. 1. 1. ]
mean value: 0.987037037037037
key: train_recall
value: [1. 1. 0.98574338 0.9389002 1. 0.99387755
0.99591837 0.99591837 1. 0.98571429]
mean value: 0.9896072155949956
key: test_roc_auc
value: [0.95420875 0.96346801 0.98148148 0.98148148 0.97272727 0.98148148
0.96296296 0.99074074 1. 0.98148148]
mean value: 0.977003367003367
key: train_roc_auc
value: [0.99693878 0.98877551 0.99185128 0.9694501 0.99591837 0.9908288
0.99083087 0.99490419 0.97861507 0.99183881]
mean value: 0.9889951785194729
key: test_jcc
value: [0.9122807 0.92982456 0.96296296 0.96296296 0.94736842 0.96491228
0.93220339 0.98214286 1. 0.96491228]
mean value: 0.9559570418513327
key: train_jcc
value: [0.99392713 0.97808765 0.98373984 0.9389002 0.99191919 0.98185484
0.98189135 0.98985801 0.95890411 0.98370672]
mean value: 0.9782789037427249
MCC on Blind test: 0.23
Accuracy on Blind test: 0.57
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02381825 0.02869153 0.02777767 0.02622557 0.02247834 0.02444983
0.0302763 0.02613544 0.02430272 0.02424407]
mean value: 0.025839972496032714
key: score_time
value: [0.01220822 0.01211882 0.01219845 0.0121603 0.01212859 0.01205921
0.01216602 0.01215434 0.01211047 0.01222229]
mean value: 0.012152671813964844
key: test_mcc
value: [0.7935982 0.92719967 0.8789643 0.98181818 0.90838671 0.96393713
0.92905936 0.96329966 1. 0.96393713]
mean value: 0.9310200334445711
key: train_mcc
value: [0.81490143 0.9677754 0.94445227 0.9836901 0.94386773 0.96789632
0.99185333 0.98372267 0.9367601 0.98175107]
mean value: 0.951667042747277
key: test_accuracy
value: [0.88990826 0.96330275 0.93577982 0.99082569 0.95412844 0.98165138
0.96330275 0.98165138 1. 0.98165138]
mean value: 0.9642201834862385
key: train_accuracy
value: [0.89908257 0.98369011 0.9714577 0.99184506 0.9714577 0.98369011
0.99592253 0.99184506 0.96738022 0.99082569]
mean value: 0.9747196738022426
key: test_fscore
value: [0.89830508 0.96226415 0.93913043 0.99082569 0.95327103 0.98214286
0.96491228 0.98181818 1. 0.98214286]
mean value: 0.9654812563388195
key: train_fscore
value: [0.90841813 0.98347107 0.97227723 0.99185336 0.97083333 0.98393574
0.99592668 0.99180328 0.96837945 0.99075026]
mean value: 0.9757648532767356
key: test_precision
value: [0.828125 0.98076923 0.8852459 0.98181818 0.96226415 0.96491228
0.93220339 0.98181818 1. 0.96491228]
mean value: 0.9482068598222352
key: train_precision
value: [0.83220339 0.99790356 0.9460501 0.99185336 0.99360341 0.96837945
0.99390244 0.99588477 0.93869732 0.99792961]
mean value: 0.9656407406073759
key: test_recall
value: [0.98148148 0.94444444 1. 1. 0.94444444 1.
1. 0.98181818 1. 1. ]
mean value: 0.9852188552188552
key: train_recall
value: [1. 0.9694501 1. 0.99185336 0.9490835 1.
0.99795918 0.9877551 1. 0.98367347]
mean value: 0.9879774720478822
key: test_roc_auc
value: [0.89074074 0.96313131 0.93636364 0.99090909 0.9540404 0.98148148
0.96296296 0.98164983 1. 0.98148148]
mean value: 0.9642760942760943
key: train_roc_auc
value: [0.89897959 0.98370464 0.97142857 0.99184505 0.97148053 0.98370672
0.9959246 0.99184089 0.96741344 0.9908184 ]
mean value: 0.9747142441497983
key: test_jcc
value: [0.81538462 0.92727273 0.8852459 0.98181818 0.91071429 0.96491228
0.93220339 0.96428571 1. 0.96491228]
mean value: 0.9346749377348886
key: train_jcc
value: [0.83220339 0.96747967 0.9460501 0.98383838 0.94331984 0.96837945
0.99188641 0.98373984 0.93869732 0.98167006]
mean value: 0.9537264455743892
MCC on Blind test: 0.33
Accuracy on Blind test: 0.62
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.51863694 0.50481296 0.51417375 0.50742078 0.510185 0.50463581
0.50489211 0.50328565 0.51146102 0.50386643]
mean value: 0.5083370447158814
key: score_time
value: [0.01617646 0.01697731 0.01649475 0.01617336 0.01631141 0.0161674
0.01612544 0.01631689 0.01627827 0.01620293]
mean value: 0.01632242202758789
key: test_mcc
value: [0.89053558 0.98181211 0.98181211 0.98181211 0.98181818 0.98181211
0.94509941 0.98181211 1. 0.98181211]
mean value: 0.9708325861574671
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94495413 0.99082569 0.99082569 0.99082569 0.99082569 0.99082569
0.97247706 0.99082569 1. 0.99082569]
mean value: 0.9853211009174312
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94545455 0.99065421 0.99065421 0.99065421 0.99082569 0.99099099
0.97297297 0.99099099 1. 0.99099099]
mean value: 0.9854188796296316
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92857143 1. 1. 1. 0.98181818 0.98214286
0.96428571 0.98214286 1. 0.98214286]
mean value: 0.9821103896103895
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 0.98148148 0.98148148 0.98148148 1. 1.
0.98181818 1. 1. 1. ]
mean value: 0.988922558922559
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94511785 0.99074074 0.99074074 0.99074074 0.99090909 0.99074074
0.97239057 0.99074074 1. 0.99074074]
mean value: 0.9852861952861953
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89655172 0.98148148 0.98148148 0.98148148 0.98181818 0.98214286
0.94736842 0.98214286 1. 0.98214286]
mean value: 0.971661134288176
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.6
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.21921444 0.22345185 0.21574569 0.2206254 0.22202802 0.13987017
0.22048545 0.23433447 0.23488665 0.21739197]
mean value: 0.21480340957641603
key: score_time
value: [0.03116179 0.02156115 0.04347515 0.02509618 0.03190017 0.02411628
0.04481173 0.02482653 0.038275 0.03011703]
mean value: 0.03153409957885742
key: test_mcc
value: [0.92659933 0.98181211 0.94509941 0.98181211 0.96396098 0.94511785
0.94511785 0.94511785 0.98181211 0.98181211]
mean value: 0.9598261714386186
key: train_mcc
value: [0.9918781 0.99390241 0.99388586 0.99592252 0.99390241 0.99185326
0.99592252 0.99187796 0.99796333 0.99388584]
mean value: 0.9940994227573303
key: test_accuracy
value: [0.96330275 0.99082569 0.97247706 0.99082569 0.98165138 0.97247706
0.97247706 0.97247706 0.99082569 0.99082569]
mean value: 0.9798165137614679
key: train_accuracy
value: [0.99592253 0.9969419 0.9969419 0.99796126 0.9969419 0.99592253
0.99796126 0.99592253 0.99898063 0.9969419 ]
mean value: 0.9970438328236494
key: test_fscore
value: [0.96296296 0.99065421 0.97196262 0.99065421 0.98181818 0.97247706
0.97247706 0.97247706 0.99099099 0.99099099]
mean value: 0.9797465347461061
key: train_fscore
value: [0.99591002 0.99693565 0.9969419 0.99796334 0.99693565 0.99591002
0.99795918 0.99590164 0.99897855 0.99693565]
mean value: 0.9970371595467664
key: test_precision
value: [0.96296296 1. 0.98113208 1. 0.96428571 0.98148148
0.98148148 0.98148148 0.98214286 0.98214286]
mean value: 0.9817110911450534
key: train_precision
value: [1. 1. 0.99795918 0.99796334 1. 0.99795082
0.99795918 1. 1. 0.99795501]
mean value: 0.9989787537366218
key: test_recall
value: [0.96296296 0.98148148 0.96296296 0.98148148 1. 0.96363636
0.96363636 0.96363636 1. 1. ]
mean value: 0.977979797979798
key: train_recall
value: [0.99185336 0.99389002 0.99592668 0.99796334 0.99389002 0.99387755
0.99795918 0.99183673 0.99795918 0.99591837]
mean value: 0.9951074441996758
key: test_roc_auc
value: [0.96329966 0.99074074 0.97239057 0.99074074 0.98181818 0.97255892
0.97255892 0.97255892 0.99074074 0.99074074]
mean value: 0.9798148148148148
key: train_roc_auc
value: [0.99592668 0.99694501 0.99694293 0.99796126 0.99694501 0.99592045
0.99796126 0.99591837 0.99897959 0.99694085]
mean value: 0.9970441414855148
key: test_jcc
value: [0.92857143 0.98148148 0.94545455 0.98148148 0.96428571 0.94642857
0.94642857 0.94642857 0.98214286 0.98214286]
mean value: 0.960484607984608
key: train_jcc
value: [0.99185336 0.99389002 0.99390244 0.99593496 0.99389002 0.99185336
0.99592668 0.99183673 0.99795918 0.99389002]
mean value: 0.9940936779063123
MCC on Blind test: 0.21
Accuracy on Blind test: 0.56
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.45224738 0.63778901 0.5324204 0.5638485 0.78409576 0.54516888
0.51332784 0.58729959 0.45964193 0.45682168]
mean value: 0.5532660961151123
key: score_time
value: [0.02797675 0.04070687 0.04118657 0.0417459 0.0412612 0.02333283
0.02900791 0.04153872 0.02351713 0.03968668]
mean value: 0.03499605655670166
key: test_mcc
value: [0.87171717 0.90841751 0.98181818 0.96329966 0.96396098 0.96393713
0.92719967 0.92719967 0.96329966 0.96393713]
mean value: 0.9434786761144899
key: train_mcc
value: [0.99593079 1. 0.99593079 0.99593079 0.99796333 0.99593082
0.99593082 0.99796334 0.99593082 0.99593082]
mean value: 0.9967442309031642
key: test_accuracy
value: [0.93577982 0.95412844 0.99082569 0.98165138 0.98165138 0.98165138
0.96330275 0.96330275 0.98165138 0.98165138]
mean value: 0.9715596330275229
key: train_accuracy
value: [0.99796126 1. 0.99796126 0.99796126 0.99898063 0.99796126
0.99796126 0.99898063 0.99796126 0.99796126]
mean value: 0.9983690112130479
key: test_fscore
value: [0.93577982 0.95412844 0.99082569 0.98148148 0.98181818 0.98214286
0.96428571 0.96428571 0.98181818 0.98214286]
mean value: 0.9718708932929117
key: train_fscore
value: [0.99796748 1. 0.99796748 0.99796748 0.99898271 0.99796334
0.99796334 0.99898063 0.99796334 0.99796334]
mean value: 0.9983719137523378
key: test_precision
value: [0.92727273 0.94545455 0.98181818 0.98148148 0.96428571 0.96491228
0.94736842 0.94736842 0.98181818 0.96491228]
mean value: 0.9606692235639605
key: train_precision
value: [0.9959432 1. 0.9959432 0.9959432 0.99796748 0.99593496
0.99593496 0.99796334 0.99593496 0.99593496]
mean value: 0.9967500271799833
key: test_recall
value: [0.94444444 0.96296296 1. 0.98148148 1. 1.
0.98181818 0.98181818 0.98181818 1. ]
mean value: 0.9834343434343434
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93585859 0.95420875 0.99090909 0.98164983 0.98181818 0.98148148
0.96313131 0.96313131 0.98164983 0.98148148]
mean value: 0.9715319865319865
key: train_roc_auc
value: [0.99795918 1. 0.99795918 0.99795918 0.99897959 0.99796334
0.99796334 0.99898167 0.99796334 0.99796334]
mean value: 0.9983692173407042
key: test_jcc
value: [0.87931034 0.9122807 0.98181818 0.96363636 0.96428571 0.96491228
0.93103448 0.93103448 0.96428571 0.96491228]
mean value: 0.9457510547528696
key: train_jcc
value: [0.9959432 1. 0.9959432 0.9959432 0.99796748 0.99593496
0.99593496 0.99796334 0.99593496 0.99593496]
mean value: 0.9967500271799833
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [2.25848699 2.24054694 2.27098012 2.30772567 2.28769517 2.29379869
2.27469635 2.2514286 2.25546622 2.24115229]
mean value: 2.268197703361511
key: score_time
value: [0.00967145 0.00950313 0.00958705 0.01027203 0.00979424 0.01037478
0.00971627 0.00944853 0.00973773 0.00967526]
mean value: 0.009778046607971191
key: test_mcc
value: [0.92659933 1. 0.96393713 0.98181211 0.96396098 0.94511785
0.94511785 0.94511785 0.98181211 0.94635821]
mean value: 0.9599833416841214
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96330275 1. 0.98165138 0.99082569 0.98165138 0.97247706
0.97247706 0.97247706 0.99082569 0.97247706]
mean value: 0.9798165137614679
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 1. 0.98113208 0.99065421 0.98181818 0.97247706
0.97247706 0.97247706 0.99099099 0.97345133]
mean value: 0.979844093694549
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 1. 1. 1. 0.96428571 0.98148148
0.98148148 0.98148148 0.98214286 0.94827586]
mean value: 0.9802111840904945
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 1. 0.96296296 0.98148148 1. 0.96363636
0.96363636 0.96363636 1. 1. ]
mean value: 0.9798316498316498
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96329966 1. 0.98148148 0.99074074 0.98181818 0.97255892
0.97255892 0.97255892 0.99074074 0.97222222]
mean value: 0.9797979797979798
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 1. 0.96296296 0.98148148 0.96428571 0.94642857
0.94642857 0.94642857 0.98214286 0.94827586]
mean value: 0.9607006020799124
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04199314 0.04448581 0.05012202 0.05785346 0.04534864 0.04628587
0.04821253 0.04629254 0.04560542 0.06620932]
mean value: 0.04924087524414063
key: score_time
value: [0.02291536 0.0129447 0.01291752 0.02138734 0.01435804 0.01392055
0.02247977 0.01402569 0.01400256 0.02144146]
mean value: 0.01703929901123047
key: test_mcc
value: [0.98181211 1. 1. 1. 1. 1.
1. 1. 0.98181818 1. ]
mean value: 0.9963630295440247
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.99082569 1. 1. 1. 1. 1.
1. 1. 0.99082569 1. ]
mean value: 0.998165137614679
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.99065421 1. 1. 1. 1. 1.
1. 1. 0.99082569 1. ]
mean value: 0.9981479893680871
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98148148 1. 1. 1. 1. 1.
1. 1. 0.98181818 1. ]
mean value: 0.9963299663299663
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.99074074 1. 1. 1. 1. 1.
1. 1. 0.99090909 1. ]
mean value: 0.9981649831649831
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.98148148 1. 1. 1. 1. 1.
1. 1. 0.98181818 1. ]
mean value: 0.9963299663299663
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.035707 0.04087877 0.04267907 0.0428102 0.0426352 0.04230046
0.04302907 0.04247069 0.04260826 0.0430162 ]
mean value: 0.04181349277496338
key: score_time
value: [0.01995635 0.01925468 0.01918221 0.01924872 0.01916957 0.01920724
0.01924849 0.01924205 0.01930523 0.01914072]
mean value: 0.019295525550842286
key: test_mcc
value: [0.946411 0.90967354 0.98181818 0.98181818 0.96396098 1.
0.87869377 0.96393713 1. 0.91202529]
mean value: 0.9538338072232938
key: train_mcc
value: [0.98181631 0.9778193 0.9737407 0.9738378 0.9738378 0.97383919
0.9778203 0.97383919 0.97383919 0.97981668]
mean value: 0.9760206475633355
key: test_accuracy
value: [0.97247706 0.95412844 0.99082569 0.99082569 0.98165138 1.
0.93577982 0.98165138 1. 0.95412844]
mean value: 0.9761467889908257
key: train_accuracy
value: [0.99082569 0.98878695 0.98674822 0.98674822 0.98674822 0.98674822
0.98878695 0.98674822 0.98674822 0.98980632]
mean value: 0.9878695208970438
key: test_fscore
value: [0.97297297 0.95495495 0.99082569 0.99082569 0.98181818 1.
0.94017094 0.98214286 1. 0.95652174]
mean value: 0.9770233022337131
key: train_fscore
value: [0.99091826 0.98892246 0.98690836 0.98693467 0.98693467 0.98690836
0.9889001 0.98690836 0.98690836 0.98989899]
mean value: 0.9880142593158917
key: test_precision
value: [0.94736842 0.92982456 0.98181818 0.98181818 0.96428571 1.
0.88709677 0.96491228 1. 0.91666667]
mean value: 0.9573790781940188
key: train_precision
value: [0.982 0.97808765 0.97609562 0.97420635 0.97420635 0.97415507
0.97804391 0.97415507 0.97415507 0.98 ]
mean value: 0.9765105086268133
key: test_recall
value: [1. 0.98148148 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9981481481481481
key: train_recall
value: [1. 1. 0.99796334 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.99979633401222
key: test_roc_auc
value: [0.97272727 0.9543771 0.99090909 0.99090909 0.98181818 1.
0.93518519 0.98148148 1. 0.9537037 ]
mean value: 0.976111111111111
key: train_roc_auc
value: [0.99081633 0.98877551 0.98673677 0.98673469 0.98673469 0.98676171
0.98879837 0.98676171 0.98676171 0.9898167 ]
mean value: 0.9878698200257701
key: test_jcc
value: [0.94736842 0.9137931 0.98181818 0.98181818 0.96428571 1.
0.88709677 0.96491228 1. 0.91666667]
mean value: 0.9557759323984955
key: train_jcc
value: [0.982 0.97808765 0.97415507 0.97420635 0.97420635 0.97415507
0.97804391 0.97415507 0.97415507 0.98 ]
mean value: 0.9763164538320758
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.26908207 0.36156917 0.38551164 0.42597938 0.36128283 0.35268307
0.35298371 0.35420609 0.44186163 0.36942291]
mean value: 0.36745824813842776
key: score_time
value: [0.02261543 0.01918912 0.02010345 0.01927495 0.01930094 0.02481413
0.01924181 0.01960397 0.02101731 0.01992035]
mean value: 0.02050814628601074
key: test_mcc
value: [0.88989899 0.92724828 0.98181818 0.98181818 0.96396098 1.
0.8582788 0.98181211 0.96396098 0.91202529]
mean value: 0.9460821804669207
key: train_mcc
value: [0.96766902 0.96359022 0.9737407 0.9738378 0.95942381 0.95961736
0.9778203 0.95742826 0.957524 0.96758197]
mean value: 0.9658233448957639
key: test_accuracy
value: [0.94495413 0.96330275 0.99082569 0.99082569 0.98165138 1.
0.9266055 0.99082569 0.98165138 0.95412844]
mean value: 0.9724770642201835
key: train_accuracy
value: [0.98369011 0.98165138 0.98674822 0.98674822 0.97961264 0.97961264
0.98878695 0.97859327 0.97859327 0.98369011]
mean value: 0.9827726809378186
key: test_fscore
value: [0.94444444 0.96363636 0.99082569 0.99082569 0.98181818 1.
0.93103448 0.99099099 0.98148148 0.95652174]
mean value: 0.9731579060407307
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.98390342 0.98189135 0.98690836 0.98693467 0.97983871 0.97987928
0.9889001 0.97880928 0.97885196 0.98383838]
mean value: 0.9829755517864163
key: test_precision
value: [0.94444444 0.94642857 0.98181818 0.98181818 0.96428571 1.
0.8852459 0.98214286 1. 0.91666667]
mean value: 0.9602850519243962
key: train_precision
value: [0.972167 0.97017893 0.97609562 0.97420635 0.97005988 0.96626984
0.97804391 0.96806387 0.96620278 0.974 ]
mean value: 0.9715288180430208
key: test_recall
value: [0.94444444 0.98148148 1. 1. 1. 1.
0.98181818 1. 0.96363636 1. ]
mean value: 0.9871380471380471
key: train_recall
value: [0.99592668 0.99389002 0.99796334 1. 0.9898167 0.99387755
1. 0.98979592 0.99183673 0.99387755]
mean value: 0.9946984496446236
key: test_roc_auc
value: [0.94494949 0.96346801 0.99090909 0.99090909 0.98181818 1.
0.92609428 0.99074074 0.98181818 0.9537037 ]
mean value: 0.9724410774410774
key: train_roc_auc
value: [0.98367763 0.98163889 0.98673677 0.98673469 0.97960223 0.97962717
0.98879837 0.97860468 0.97860676 0.98370049]
mean value: 0.9827727669479197
key: test_jcc
value: [0.89473684 0.92982456 0.98181818 0.98181818 0.96428571 1.
0.87096774 0.98214286 0.96363636 0.91666667]
mean value: 0.9485897110812221
key: train_jcc
value: [0.96831683 0.96442688 0.97415507 0.97420635 0.96047431 0.96055227
0.97804391 0.95849802 0.95857988 0.96819085]
mean value: 0.9665444376905993
MCC on Blind test: 0.32
Accuracy on Blind test: 0.61
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04058647 0.04435062 0.04309654 0.04436064 0.04376888 0.0449791
0.04290581 0.04352117 0.04341054 0.04521823]
mean value: 0.04361979961395264
key: score_time
value: [0.01262784 0.01453471 0.01264024 0.01438498 0.01472068 0.01512718
0.01484346 0.01487494 0.0149014 0.01476884]
mean value: 0.014342427253723145
key: test_mcc
value: [0.946411 0.96396098 0.98181818 0.96396098 0.96396098 0.94635821
0.91202529 0.98181211 0.96393713 0.92905936]
mean value: 0.9553304233697641
key: train_mcc
value: [0.96987162 0.96987162 0.96592058 0.96592058 0.96987162 0.97185442
0.97383919 0.97383919 0.96395333 0.97582781]
mean value: 0.9700769986001025
key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 0.98165138 0.98165138 0.97247706
0.95412844 0.99082569 0.98165138 0.96330275]
mean value: 0.9770642201834863
key: train_accuracy
value: [0.98470948 0.98470948 0.98267074 0.98267074 0.98470948 0.98572885
0.98674822 0.98674822 0.98165138 0.98776758]
mean value: 0.9848114169215086
key: test_fscore
value: [0.97297297 0.98181818 0.99082569 0.98181818 0.98181818 0.97345133
0.95652174 0.99099099 0.98214286 0.96491228]
mean value: 0.9777272401900579
key: train_fscore
value: [0.98495486 0.98495486 0.98298298 0.98298298 0.98495486 0.98591549
0.98690836 0.98690836 0.98196393 0.98790323]
mean value: 0.9850429923386353
key: test_precision
value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586
0.91666667 0.98214286 0.96491228 0.93220339]
mean value: 0.9566244802138708
key: train_precision
value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222
0.97415507 0.97415507 0.96456693 0.97609562]
mean value: 0.9705332967868592
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 0.98181818 0.98181818 0.97222222
0.9537037 0.99074074 0.98148148 0.96296296]
mean value: 0.977020202020202
key: train_roc_auc
value: [0.98469388 0.98469388 0.98265306 0.98265306 0.98469388 0.98574338
0.98676171 0.98676171 0.98167006 0.98778004]
mean value: 0.9848104659379027
key: test_jcc
value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586
0.91666667 0.98214286 0.96491228 0.93220339]
mean value: 0.9566244802138708
key: train_jcc
value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222
0.97415507 0.97415507 0.96456693 0.97609562]
mean value: 0.9705332967868592
MCC on Blind test: 0.33
Accuracy on Blind test: 0.61
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.17332339 1.07144642 1.20221496 1.05029082 1.10251975 1.00360799
1.09597087 1.00588417 1.12927198 1.03671074]
mean value: 1.0871241092681885
key: score_time
value: [0.01915336 0.0194943 0.01916051 0.03661346 0.01905918 0.0230329
0.02343559 0.01981854 0.02063799 0.01904988]
mean value: 0.02194557189941406
key: test_mcc
value: [0.96396098 0.96396098 0.98181818 0.98181818 0.9291517 0.94635821
0.92905936 0.98181211 0.98181211 0.92905936]
mean value: 0.9588811186918972
key: train_mcc
value: [1. 0.99796333 1. 1. 1. 0.9778203
0.99796334 1. 1. 0.99796334]
mean value: 0.9971710313107653
key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 0.99082569 0.96330275 0.97247706
0.96330275 0.99082569 0.99082569 0.96330275]
mean value: 0.9788990825688073
key: train_accuracy
value: [1. 0.99898063 1. 1. 1. 0.98878695
0.99898063 1. 1. 0.99898063]
mean value: 0.9985728848114169
key: test_fscore
value: [0.98181818 0.98181818 0.99082569 0.99082569 0.96428571 0.97345133
0.96491228 0.99099099 0.99099099 0.96491228]
mean value: 0.9794831324887986
key: train_fscore
value: [1. 0.99898271 1. 1. 1. 0.9889001
0.99898063 1. 1. 0.99898063]
mean value: 0.9985844070926517
key: test_precision
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586
0.93220339 0.98214286 0.98214286 0.93220339]
mean value: 0.9600210630982109
key: train_precision
value: [1. 0.99796748 1. 1. 1. 0.97804391
0.99796334 1. 1. 0.99796334]
mean value: 0.9971938072094845
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 0.99090909 0.96363636 0.97222222
0.96296296 0.99074074 0.99074074 0.96296296]
mean value: 0.9788720538720539
key: train_roc_auc
value: [1. 0.99897959 1. 1. 1. 0.98879837
0.99898167 1. 1. 0.99898167]
mean value: 0.9985741302631032
key: test_jcc
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586
0.93220339 0.98214286 0.98214286 0.93220339]
mean value: 0.9600210630982109
key: train_jcc
value: [1. 0.99796748 1. 1. 1. 0.97804391
0.99796334 1. 1. 0.99796334]
mean value: 0.9971938072094845
MCC on Blind test: 0.21
Accuracy on Blind test: 0.56
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.03226662 0.01357055 0.01272416 0.01376796 0.01404357 0.01343679
0.01265812 0.01321292 0.01250315 0.01331568]
mean value: 0.015149950981140137
key: score_time
value: [0.01026297 0.01015615 0.01043391 0.00975466 0.01017189 0.00939393
0.009799 0.00966072 0.0104816 0.00953364]
mean value: 0.009964847564697265
key: test_mcc
value: [0.76807644 0.8585559 0.77238262 0.75632883 0.75156361 0.69717653
0.82131618 0.846254 0.87280881 0.75090861]
mean value: 0.7895371517463353
key: train_mcc
value: [0.79150846 0.78327281 0.79289836 0.79289836 0.79885223 0.79385491
0.80032507 0.79025852 0.78014362 0.79156576]
mean value: 0.7915578108083023
key: test_accuracy
value: [0.88073394 0.9266055 0.88073394 0.87155963 0.87155963 0.8440367
0.90825688 0.91743119 0.93577982 0.87155963]
mean value: 0.8908256880733945
key: train_accuracy
value: [0.89194699 0.88786952 0.89296636 0.89296636 0.89602446 0.89296636
0.89704383 0.89194699 0.88583078 0.89194699]
mean value: 0.8921508664627931
key: test_fscore
value: [0.88695652 0.92982456 0.88888889 0.88135593 0.87931034 0.85714286
0.9137931 0.92436975 0.9380531 0.88135593]
mean value: 0.8981050987101319
key: train_fscore
value: [0.89904762 0.8952381 0.89971347 0.89971347 0.90248566 0.89990467
0.90297791 0.89827255 0.89353612 0.89885496]
mean value: 0.898974452130224
key: test_precision
value: [0.83606557 0.88333333 0.82539683 0.8125 0.82258065 0.796875
0.86885246 0.859375 0.9137931 0.82539683]
mean value: 0.8444168765523435
key: train_precision
value: [0.84436494 0.84078712 0.8471223 0.8471223 0.85045045 0.84436494
0.85299456 0.84782609 0.83629893 0.84408602]
mean value: 0.8455417645600413
key: test_recall
value: [0.94444444 0.98148148 0.96296296 0.96296296 0.94444444 0.92727273
0.96363636 1. 0.96363636 0.94545455]
mean value: 0.9596296296296296
key: train_recall
value: [0.96130346 0.95723014 0.9592668 0.9592668 0.96130346 0.96326531
0.95918367 0.95510204 0.95918367 0.96122449]
mean value: 0.959632985577123
key: test_roc_auc
value: [0.88131313 0.92710438 0.88148148 0.87239057 0.87222222 0.84326599
0.90774411 0.91666667 0.93552189 0.87087542]
mean value: 0.8908585858585859
key: train_roc_auc
value: [0.89187622 0.88779874 0.89289871 0.89289871 0.89595785 0.89303795
0.89710711 0.89201131 0.88590548 0.89201754]
mean value: 0.892150962217881
key: test_jcc
value: [0.796875 0.86885246 0.8 0.78787879 0.78461538 0.75
0.84126984 0.859375 0.88333333 0.78787879]
mean value: 0.8160078593992528
key: train_jcc
value: [0.816609 0.81034483 0.81770833 0.81770833 0.82229965 0.81802426
0.82311734 0.81533101 0.80756014 0.81629116]
mean value: 0.8164994052884171
MCC on Blind test: 0.46
Accuracy on Blind test: 0.71
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01432371 0.01744103 0.01744962 0.01802087 0.01732731 0.01733041
0.0174067 0.01744223 0.01742673 0.01739192]
mean value: 0.017156052589416503
key: score_time
value: [0.01219678 0.01250386 0.01259708 0.01271725 0.01278138 0.01281834
0.01276517 0.01271963 0.01281548 0.01277828]
mean value: 0.01266932487487793
key: test_mcc
value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759
0.7280447 0.72598622 0.74351235 0.76486923]
mean value: 0.7298888092654613
key: train_mcc
value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039
0.73419371 0.73010884 0.75399333 0.72585556]
mean value: 0.7346532327609935
key: test_accuracy
value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688
0.86238532 0.86238532 0.87155963 0.88073394]
mean value: 0.8642201834862385
key: train_accuracy
value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911
0.86646279 0.86442406 0.87665647 0.86238532]
mean value: 0.8667686034658512
key: test_fscore
value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038
0.85714286 0.85981308 0.875 0.87619048]
mean value: 0.8608265343006679
key: train_fscore
value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602
0.86225026 0.86014721 0.8738269 0.85834208]
mean value: 0.8630178891837997
key: test_precision
value: [0.84 0.90384615 0.88235294 0.88235294 0.81818182 0.94117647
0.9 0.88461538 0.85964912 0.92 ]
mean value: 0.883217483239155
key: train_precision
value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882
0.88937093 0.88720174 0.89339019 0.88336933]
mean value: 0.8878767209813069
key: test_recall
value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727
0.81818182 0.83636364 0.89090909 0.83636364]
mean value: 0.8402693602693603
key: train_recall
value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224
0.83673469 0.83469388 0.85510204 0.83469388]
mean value: 0.8395502722473919
key: test_roc_auc
value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586
0.86279461 0.86262626 0.87138047 0.88114478]
mean value: 0.8642424242424243
key: train_roc_auc
value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299
0.86643252 0.86439378 0.87663452 0.86235712]
mean value: 0.8667689845795752
key: test_jcc
value: [0.67741935 0.79661017 0.75 0.75 0.703125 0.82758621
0.75 0.75409836 0.77777778 0.77966102]
mean value: 0.7566277886609455
key: train_jcc
value: [0.76199262 0.75 0.7606679 0.75506446 0.7804428 0.74270073
0.75785582 0.75461255 0.77592593 0.75183824]
mean value: 0.7591101044424549
MCC on Blind test: 0.36
Accuracy on Blind test: 0.66
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01675367 0.01172948 0.01112866 0.01174927 0.01270866 0.01112056
0.01259708 0.01317763 0.01103449 0.01173782]
mean value: 0.012373733520507812
key: score_time
value: [0.03388858 0.01459813 0.0150125 0.01497555 0.0157063 0.01583314
0.01537919 0.0203886 0.01956248 0.01527023]
mean value: 0.0180614709854126
key: test_mcc
value: [0.91216737 0.946411 0.946411 0.98181818 1. 0.94635821
0.91202529 0.91202529 1. 0.96393713]
mean value: 0.9521153465030314
key: train_mcc
value: [0.96789422 0.96592058 0.96592058 0.96395068 0.96198449 0.96198744
0.96395333 0.97185442 0.96002526 0.96002526]
mean value: 0.9643516270143173
key: test_accuracy
value: [0.95412844 0.97247706 0.97247706 0.99082569 1. 0.97247706
0.95412844 0.95412844 1. 0.98165138]
mean value: 0.9752293577981652
key: train_accuracy
value: [0.98369011 0.98267074 0.98267074 0.98165138 0.98063201 0.98063201
0.98165138 0.98572885 0.97961264 0.97961264]
mean value: 0.981855249745158
key: test_fscore
value: [0.95575221 0.97297297 0.97297297 0.99082569 1. 0.97345133
0.95652174 0.95652174 1. 0.98214286]
mean value: 0.9761161509246076
key: train_fscore
value: [0.98396794 0.98298298 0.98298298 0.982 0.98101898 0.98098098
0.98196393 0.98591549 0.98 0.98 ]
mean value: 0.9821813284651129
key: test_precision
value: [0.91525424 0.94736842 0.94736842 0.98181818 1. 0.94827586
0.91666667 0.91666667 1. 0.96491228]
mean value: 0.9538330737315633
key: train_precision
value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451 0.96267191
0.96456693 0.97222222 0.96078431 0.96078431]
mean value: 0.9649924005520801
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.97272727 0.97272727 0.99090909 1. 0.97222222
0.9537037 0.9537037 1. 0.98148148]
mean value: 0.9752020202020202
key: train_roc_auc
value: [0.98367347 0.98265306 0.98265306 0.98163265 0.98061224 0.98065173
0.98167006 0.98574338 0.9796334 0.9796334 ]
mean value: 0.9818556465356
key: test_jcc
value: [0.91525424 0.94736842 0.94736842 0.98181818 1. 0.94827586
0.91666667 0.91666667 1. 0.96491228]
mean value: 0.9538330737315633
key: train_jcc
value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451 0.96267191
0.96456693 0.97222222 0.96078431 0.96078431]
mean value: 0.9649924005520801
MCC on Blind test: 0.19
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0302546 0.02813029 0.02809978 0.0278976 0.02761269 0.02798128
0.03074837 0.02966928 0.03014135 0.02853847]
mean value: 0.028907370567321778
key: score_time
value: [0.01462579 0.01377559 0.01365614 0.01365614 0.01376009 0.0138247
0.01475239 0.01382637 0.01384187 0.01394892]
mean value: 0.013966798782348633
key: test_mcc
value: [0.96396098 0.98181818 1. 1. 0.96396098 1.
0.94635821 0.98181211 1. 0.94635821]
mean value: 0.9784268692563587
key: train_mcc
value: [0.98382069 0.98382069 0.98181631 0.98181631 0.98382069 0.98181698
0.98582943 0.98382123 0.97981668 0.98582943]
mean value: 0.983220845209887
key: test_accuracy
value: [0.98165138 0.99082569 1. 1. 0.98165138 1.
0.97247706 0.99082569 1. 0.97247706]
mean value: 0.9889908256880734
key: train_accuracy
value: [0.99184506 0.99184506 0.99082569 0.99082569 0.99184506 0.99082569
0.99286442 0.99184506 0.98980632 0.99286442]
mean value: 0.991539245667686
key: test_fscore
value: [0.98181818 0.99082569 1. 1. 0.98181818 1.
0.97345133 0.99099099 1. 0.97345133]
mean value: 0.9892355697568005
key: train_fscore
value: [0.99191919 0.99191919 0.99091826 0.99091826 0.99191919 0.9908999
0.9929078 0.99190283 0.98989899 0.9929078 ]
mean value: 0.9916111430148137
key: test_precision
value: [0.96428571 0.98181818 1. 1. 0.96428571 1.
0.94827586 0.98214286 1. 0.94827586]
mean value: 0.9789084191670399
key: train_precision
value: [0.98396794 0.98396794 0.982 0.982 0.98396794 0.98196393
0.98591549 0.98393574 0.98 0.98591549]
mean value: 0.9833634464358323
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98181818 0.99090909 1. 1. 0.98181818 1.
0.97222222 0.99074074 1. 0.97222222]
mean value: 0.988973063973064
key: train_roc_auc
value: [0.99183673 0.99183673 0.99081633 0.99081633 0.99183673 0.99083503
0.99287169 0.99185336 0.9898167 0.99287169]
mean value: 0.9915391329647949
key: test_jcc
value: [0.96428571 0.98181818 1. 1. 0.96428571 1.
0.94827586 0.98214286 1. 0.94827586]
mean value: 0.9789084191670399
key: train_jcc
value: [0.98396794 0.98396794 0.982 0.982 0.98396794 0.98196393
0.98591549 0.98393574 0.98 0.98591549]
mean value: 0.9833634464358323
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.71428227 2.08614016 2.62061143 2.41919136 2.2262578 2.1079421
2.22521067 2.41262627 2.21698761 2.23710108]
mean value: 2.2266350746154786
key: score_time
value: [0.01254702 0.01257324 0.01411819 0.01264691 0.01291251 0.01294541
0.01289654 0.01295042 0.01260424 0.01268959]
mean value: 0.012888407707214356
key: test_mcc
value: [0.96396098 0.96396098 0.98181818 1. 0.96396098 0.98181211
0.92905936 0.98181211 1. 0.92905936]
mean value: 0.9695444073814772
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 1. 0.98165138 0.99082569
0.96330275 0.99082569 1. 0.96330275]
mean value: 0.9844036697247707
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.98181818 0.99082569 1. 0.98181818 0.99099099
0.96491228 0.99099099 1. 0.96491228]
mean value: 0.9848086776913431
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.96428571 0.98181818 1. 0.96428571 0.98214286
0.93220339 0.98214286 1. 0.93220339]
mean value: 0.9703367818622056
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 1. 0.98181818 0.99074074
0.96296296 0.99074074 1. 0.96296296]
mean value: 0.9843771043771044
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.96428571 0.98181818 1. 0.96428571 0.98214286
0.93220339 0.98214286 1. 0.93220339]
mean value: 0.9703367818622056
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.59
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04392576 0.0322082 0.03475714 0.03228712 0.0325017 0.033427
0.03042507 0.03155088 0.03268433 0.03452039]
mean value: 0.03382875919342041
key: score_time
value: [0.0112896 0.00917888 0.00912571 0.00906205 0.0092423 0.00923324
0.00922656 0.00922465 0.00923491 0.00925946]
mean value: 0.009407734870910645
key: test_mcc
value: [0.9291517 0.946411 0.98181818 1. 0.98181818 1.
0.94635821 0.98181211 0.98181211 0.94635821]
mean value: 0.969553972019105
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96330275 0.97247706 0.99082569 1. 0.99082569 1.
0.97247706 0.99082569 0.99082569 0.97247706]
mean value: 0.9844036697247707
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.97297297 0.99082569 1. 0.99082569 1.
0.97345133 0.99099099 0.99099099 0.97345133]
mean value: 0.9847794700254715
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.94736842 0.98181818 1. 0.98181818 1.
0.94827586 0.98214286 0.98214286 0.94827586]
mean value: 0.9702876705871261
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96363636 0.97272727 0.99090909 1. 0.99090909 1.
0.97222222 0.99074074 0.99074074 0.97222222]
mean value: 0.9844107744107744
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.94736842 0.98181818 1. 0.98181818 1.
0.94827586 0.98214286 0.98214286 0.94827586]
mean value: 0.9702876705871261
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.18
Accuracy on Blind test: 0.55
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.1274991 0.12812495 0.12774634 0.12863183 0.12888646 0.13110614
0.12780571 0.12781787 0.12792253 0.12815571]
mean value: 0.12836966514587403
key: score_time
value: [0.01829576 0.01841164 0.01846838 0.0185349 0.01836324 0.01849031
0.01845002 0.0185163 0.01841092 0.0183692 ]
mean value: 0.01843106746673584
key: test_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.54
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01226139 0.01228285 0.01226473 0.01214695 0.01218414 0.01226926
0.01206851 0.01219273 0.01233721 0.01204967]
mean value: 0.012205743789672851
key: score_time
value: [0.00921822 0.00932813 0.00931048 0.00925422 0.00949788 0.00925541
0.0092206 0.00980568 0.00936985 0.0093658 ]
mean value: 0.009362626075744628
key: test_mcc
value: [0.946411 0.98181818 0.98181818 0.98181818 0.946411 0.96393713
0.98181211 0.96393713 0.96393713 0.96393713]
mean value: 0.9675837174332553
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97247706 0.99082569 0.99082569 0.99082569 0.97247706 0.98165138
0.99082569 0.98165138 0.98165138 0.98165138]
mean value: 0.9834862385321101
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97297297 0.99082569 0.99082569 0.99082569 0.97297297 0.98214286
0.99099099 0.98214286 0.98214286 0.98214286]
mean value: 0.9837985429728549
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228
0.98214286 0.96491228 0.96491228 0.96491228]
mean value: 0.9681983367509683
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.99090909 0.99090909 0.99090909 0.97272727 0.98148148
0.99074074 0.98148148 0.98148148 0.98148148]
mean value: 0.9834848484848485
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228
0.98214286 0.96491228 0.96491228 0.96491228]
mean value: 0.9681983367509683
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.57
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.84613895 1.84882665 1.8708818 1.86475921 1.84473586 1.86491394
1.85439205 1.88788891 1.88814402 1.87820005]
mean value: 1.8648881435394287
key: score_time
value: [0.10223675 0.09952426 0.09806609 0.09618688 0.09979129 0.1016283
0.09665799 0.10203648 0.09662509 0.10175681]
mean value: 0.09945099353790283
key: test_mcc
value: [0.96396098 0.98181818 1. 1. 1. 1.
0.98181211 0.98181211 1. 0.98181211]
mean value: 0.9891215506967859
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98165138 0.99082569 1. 1. 1. 1.
0.99082569 0.99082569 1. 0.99082569]
mean value: 0.9944954128440368
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.99082569 1. 1. 1. 1.
0.99099099 0.99099099 1. 0.99099099]
mean value: 0.9945616842864549
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.98181818 1. 1. 1. 1.
0.98214286 0.98214286 1. 0.98214286]
mean value: 0.9892532467532468
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98181818 0.99090909 1. 1. 1. 1.
0.99074074 0.99074074 1. 0.99074074]
mean value: 0.9944949494949495
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.98181818 1. 1. 1. 1.
0.98214286 0.98214286 1. 0.98214286]
mean value: 0.9892532467532468
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.21
Accuracy on Blind test: 0.55
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.09517479 1.02668071 1.03418374 1.04638362 1.05752635 1.01023602
1.05089188 1.00196314 1.07685375 1.11304617]
mean value: 1.0512940168380738
key: score_time
value: [0.28117323 0.28493404 0.23831916 0.273947 0.27481461 0.26104164
0.21751809 0.22988033 0.25270414 0.24373722]
mean value: 0.2558069467544556
key: test_mcc
value: [0.946411 0.96396098 1. 1. 0.96396098 1.
0.94635821 0.96393713 0.98181211 0.96393713]
mean value: 0.9730377554095189
key: train_mcc
value: [0.99187796 0.98784133 0.98784133 0.98985763 0.98784133 0.98582943
0.98582943 0.98784163 0.98985784 0.98985784]
mean value: 0.9884475773437704
key: test_accuracy
value: [0.97247706 0.98165138 1. 1. 0.98165138 1.
0.97247706 0.98165138 0.99082569 0.98165138]
mean value: 0.9862385321100917
key: train_accuracy
value: [0.99592253 0.99388379 0.99388379 0.99490316 0.99388379 0.99286442
0.99286442 0.99388379 0.99490316 0.99490316]
mean value: 0.9941896024464831
key: test_fscore
value: [0.97297297 0.98181818 1. 1. 0.98181818 1.
0.97345133 0.98214286 0.99099099 0.98214286]
mean value: 0.986533736931967
key: train_fscore
value: [0.9959432 0.99392713 0.99392713 0.99493414 0.99392713 0.9929078
0.9929078 0.99391481 0.99492386 0.99492386]
mean value: 0.9942236851131838
key: test_precision
value: [0.94736842 0.96428571 1. 1. 0.96428571 1.
0.94827586 0.96491228 0.98214286 0.96491228]
mean value: 0.9736183130239392
key: train_precision
value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549
0.98591549 0.98790323 0.98989899 0.98989899]
mean value: 0.9885153434454889
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.98181818 1. 1. 0.98181818 1.
0.97222222 0.98148148 0.99074074 0.98148148]
mean value: 0.9862289562289562
key: train_roc_auc
value: [0.99591837 0.99387755 0.99387755 0.99489796 0.99387755 0.99287169
0.99287169 0.99389002 0.99490835 0.99490835]
mean value: 0.994189908142483
key: test_jcc
value: [0.94736842 0.96428571 1. 1. 0.96428571 1.
0.94827586 0.96491228 0.98214286 0.96491228]
mean value: 0.9736183130239392
key: train_jcc
value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549
0.98591549 0.98790323 0.98989899 0.98989899]
mean value: 0.9885153434454889
MCC on Blind test: 0.29
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02011395 0.01671338 0.01653075 0.02227449 0.01953197 0.01918554
0.01853752 0.01888394 0.01923013 0.01840615]
mean value: 0.01894078254699707
key: score_time
value: [0.01299596 0.01254225 0.01255369 0.01404405 0.01341295 0.01391649
0.01417804 0.01381135 0.01417184 0.0138309 ]
mean value: 0.013545751571655273
key: test_mcc
value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759
0.7280447 0.72598622 0.74351235 0.76486923]
mean value: 0.7298888092654613
key: train_mcc
value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039
0.73419371 0.73010884 0.75399333 0.72585556]
mean value: 0.7346532327609935
key: test_accuracy
value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688
0.86238532 0.86238532 0.87155963 0.88073394]
mean value: 0.8642201834862385
key: train_accuracy
value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911
0.86646279 0.86442406 0.87665647 0.86238532]
mean value: 0.8667686034658512
key: test_fscore
value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038
0.85714286 0.85981308 0.875 0.87619048]
mean value: 0.8608265343006679
key: train_fscore
value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602
0.86225026 0.86014721 0.8738269 0.85834208]
mean value: 0.8630178891837997
key: test_precision
value: [0.84 0.90384615 0.88235294 0.88235294 0.81818182 0.94117647
0.9 0.88461538 0.85964912 0.92 ]
mean value: 0.883217483239155
key: train_precision
value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882
0.88937093 0.88720174 0.89339019 0.88336933]
mean value: 0.8878767209813069
key: test_recall
value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727
0.81818182 0.83636364 0.89090909 0.83636364]
mean value: 0.8402693602693603
key: train_recall
value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224
0.83673469 0.83469388 0.85510204 0.83469388]
mean value: 0.8395502722473919
key: test_roc_auc
value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586
0.86279461 0.86262626 0.87138047 0.88114478]
mean value: 0.8642424242424243
key: train_roc_auc
value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299
0.86643252 0.86439378 0.87663452 0.86235712]
mean value: 0.8667689845795752
key: test_jcc
value: [0.67741935 0.79661017 0.75 0.75 0.703125 0.82758621
0.75 0.75409836 0.77777778 0.77966102]
mean value: 0.7566277886609455
key: train_jcc
value: [0.76199262 0.75 0.7606679 0.75506446 0.7804428 0.74270073
0.75785582 0.75461255 0.77592593 0.75183824]
mean value: 0.7591101044424549
MCC on Blind test: 0.36
Accuracy on Blind test: 0.66
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.11796188 0.10961556 0.10818744 0.10456371 0.1058588 0.1078229
0.25975823 0.10478783 0.10844398 0.13115287]
mean value: 0.1258153200149536
key: score_time
value: [0.01118398 0.011338 0.01129174 0.01120877 0.01139569 0.01124907
0.01117444 0.0112884 0.01124215 0.01141405]
mean value: 0.011278629302978516
key: test_mcc
value: [0.91216737 1. 1. 1. 0.96396098 0.98181211
0.96393713 0.98181211 0.98181211 0.96393713]
mean value: 0.9749438950975768
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95412844 1. 1. 1. 0.98165138 0.99082569
0.98165138 0.99082569 0.99082569 0.98165138]
mean value: 0.9871559633027523
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95575221 1. 1. 1. 0.98181818 0.99099099
0.98214286 0.99099099 0.99099099 0.98214286]
mean value: 0.987482908146625
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91525424 1. 1. 1. 0.96428571 0.98214286
0.96491228 0.98214286 0.98214286 0.96491228]
mean value: 0.975579308440593
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 1. 1. 1. 0.98181818 0.99074074
0.98148148 0.99074074 0.99074074 0.98148148]
mean value: 0.9871548821548821
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91525424 1. 1. 1. 0.96428571 0.98214286
0.96491228 0.98214286 0.98214286 0.96491228]
mean value: 0.975579308440593
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.6
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.07150984 0.06511164 0.06523538 0.06628895 0.09167719 0.06410146
0.07649589 0.09414434 0.06980395 0.06749415]
mean value: 0.07318627834320068
key: score_time
value: [0.02362418 0.01245999 0.01245952 0.01243114 0.01258183 0.01239157
0.01956177 0.02616715 0.01251602 0.02478385]
mean value: 0.016897702217102052
key: test_mcc
value: [0.89544301 0.91216737 0.89544301 1. 0.946411 0.98181211
0.87869377 0.89524142 0.98181211 0.89524142]
mean value: 0.9282265219647683
key: train_mcc
value: [0.97582662 0.96987162 0.9738378 0.96789422 0.96987162 0.96789632
0.96987347 0.97185442 0.96592295 0.97383919]
mean value: 0.9706688251163492
key: test_accuracy
value: [0.94495413 0.95412844 0.94495413 1. 0.97247706 0.99082569
0.93577982 0.94495413 0.99082569 0.94495413]
mean value: 0.9623853211009175
key: train_accuracy
value: [0.98776758 0.98470948 0.98674822 0.98369011 0.98470948 0.98369011
0.98470948 0.98572885 0.98267074 0.98674822]
mean value: 0.9851172273190621
key: test_fscore
value: [0.94736842 0.95575221 0.94736842 1. 0.97297297 0.99099099
0.94017094 0.94827586 0.99099099 0.94827586]
mean value: 0.964216667375847
key: train_fscore
value: [0.98792757 0.98495486 0.98693467 0.98396794 0.98495486 0.98393574
0.98492462 0.98591549 0.98294885 0.98690836]
mean value: 0.9853372967912892
key: test_precision
value: [0.9 0.91525424 0.9 1. 0.94736842 0.98214286
0.88709677 0.90163934 0.98214286 0.90163934]
mean value: 0.931728383534462
key: train_precision
value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945
0.97029703 0.97222222 0.96646943 0.97415507]
mean value: 0.9711025963561587
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94545455 0.95454545 0.94545455 1. 0.97272727 0.99074074
0.93518519 0.94444444 0.99074074 0.94444444]
mean value: 0.9623737373737373
key: train_roc_auc
value: [0.9877551 0.98469388 0.98673469 0.98367347 0.98469388 0.98370672
0.98472505 0.98574338 0.98268839 0.98676171]
mean value: 0.9851176274990648
key: test_jcc
value: [0.9 0.91525424 0.9 1. 0.94736842 0.98214286
0.88709677 0.90163934 0.98214286 0.90163934]
mean value: 0.931728383534462
key: train_jcc
value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945
0.97029703 0.97222222 0.96646943 0.97415507]
mean value: 0.9711025963561587
MCC on Blind test: 0.28
Accuracy on Blind test: 0.61
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01576996 0.0410161 0.04309177 0.01604724 0.01587033 0.01609635
0.0160675 0.01610398 0.01614022 0.01605773]
mean value: 0.021226119995117188
key: score_time
value: [0.01217413 0.01242757 0.01232624 0.01218367 0.01216888 0.01491857
0.01216054 0.01213479 0.01214075 0.01212192]
mean value: 0.012475705146789551
key: test_mcc
value: [0.68811051 0.67128761 0.72482321 0.65151515 0.56345904 0.67025938
0.74368478 0.71100747 0.67172877 0.72598622]
mean value: 0.6821862154657743
key: train_mcc
value: [0.68441792 0.69257749 0.67825101 0.68233052 0.69490253 0.68014841
0.67406705 0.68422743 0.68630482 0.68439564]
mean value: 0.6841622811613333
key: test_accuracy
value: [0.8440367 0.83486239 0.86238532 0.82568807 0.77981651 0.83486239
0.87155963 0.85321101 0.83486239 0.86238532]
mean value: 0.8403669724770643
key: train_accuracy
value: [0.84199796 0.84607543 0.83893986 0.84097859 0.8470948 0.83995923
0.83690112 0.84199796 0.84301733 0.84199796]
mean value: 0.8418960244648318
key: test_fscore
value: [0.8411215 0.82692308 0.85981308 0.82568807 0.78947368 0.83333333
0.87037037 0.84615385 0.83018868 0.85981308]
mean value: 0.8382878727182334
key: train_fscore
value: [0.83937824 0.84352332 0.83643892 0.83850932 0.84375 0.83764219
0.83436853 0.83971044 0.84057971 0.83904465]
mean value: 0.8392945323885889
key: test_precision
value: [0.8490566 0.86 0.86792453 0.81818182 0.75 0.8490566
0.88679245 0.89795918 0.8627451 0.88461538]
mean value: 0.8526331673189134
key: train_precision
value: [0.85443038 0.85864979 0.85052632 0.85263158 0.86353945 0.8490566
0.84663866 0.85115304 0.85294118 0.85412262]
mean value: 0.8533689606245336
key: test_recall
value: [0.83333333 0.7962963 0.85185185 0.83333333 0.83333333 0.81818182
0.85454545 0.8 0.8 0.83636364]
mean value: 0.8257239057239057
key: train_recall
value: [0.82484725 0.82892057 0.82281059 0.82484725 0.82484725 0.82653061
0.82244898 0.82857143 0.82857143 0.8244898 ]
mean value: 0.8256885157321584
key: test_roc_auc
value: [0.84393939 0.83451178 0.86228956 0.82575758 0.78030303 0.83501684
0.87171717 0.8537037 0.83518519 0.86262626]
mean value: 0.8405050505050505
key: train_roc_auc
value: [0.84201546 0.84609294 0.83895632 0.84099505 0.8471175 0.83994555
0.8368864 0.84198429 0.84300262 0.84198013]
mean value: 0.8418976266677751
key: test_jcc
value: [0.72580645 0.70491803 0.75409836 0.703125 0.65217391 0.71428571
0.7704918 0.73333333 0.70967742 0.75409836]
mean value: 0.7222008389007317
key: train_jcc
value: [0.72321429 0.72939068 0.71886121 0.72192513 0.72972973 0.72064057
0.71580817 0.72370766 0.725 0.72271914]
mean value: 0.7230996586219895
MCC on Blind test: 0.45
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0369277 0.04088545 0.03215814 0.03854275 0.03352094 0.03347206
0.0334053 0.02965021 0.04916883 0.03165388]
mean value: 0.03593852519989014
key: score_time
value: [0.01192832 0.01221871 0.01212454 0.01213908 0.01212454 0.0121572
0.0121696 0.01220918 0.01216412 0.01215053]
mean value: 0.012138581275939942
key: test_mcc
value: [0.96396098 0.96396098 0.98181818 0.98181818 0.946411 0.98181211
0.92905936 0.98181211 1. 0.92905936]
mean value: 0.9659712270812341
key: train_mcc
value: [0.99796333 0.99796333 0.98784133 0.98784133 0.97582662 0.98985784
0.98985784 0.9918781 0.99593082 0.9918781 ]
mean value: 0.9906838647005597
key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569
0.96330275 0.99082569 1. 0.96330275]
mean value: 0.9825688073394496
key: train_accuracy
value: [0.99898063 0.99898063 0.99388379 0.99388379 0.98776758 0.99490316
0.99490316 0.99592253 0.99796126 0.99592253]
mean value: 0.9953109072375127
key: test_fscore
value: [0.98181818 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099
0.96491228 0.99099099 1. 0.96491228]
mean value: 0.9830067256141616
key: train_fscore
value: [0.99898271 0.99898271 0.99392713 0.99392713 0.98792757 0.99492386
0.99492386 0.99593496 0.99796334 0.99593496]
mean value: 0.9953428202965996
key: test_precision
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
0.93220339 0.98214286 1. 0.93220339]
mean value: 0.9668268707207155
key: train_precision
value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899
0.98989899 0.99190283 0.99593496 0.99190283]
mean value: 0.9907471838451151
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074
0.96296296 0.99074074 1. 0.96296296]
mean value: 0.9825589225589225
key: train_roc_auc
value: [0.99897959 0.99897959 0.99387755 0.99387755 0.9877551 0.99490835
0.99490835 0.99592668 0.99796334 0.99592668]
mean value: 0.9953102788977097
key: test_jcc
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
0.93220339 0.98214286 1. 0.93220339]
mean value: 0.9668268707207155
key: train_jcc
value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899
0.98989899 0.99190283 0.99593496 0.99190283]
mean value: 0.9907471838451151
MCC on Blind test: 0.26
Accuracy on Blind test: 0.58
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02573824 0.02769852 0.02286506 0.0265317 0.02289176 0.02876425
0.02368808 0.02303743 0.02485371 0.02317619]
mean value: 0.024924492835998534
key: score_time
value: [0.0121398 0.01220655 0.01216602 0.01215792 0.01220179 0.01250315
0.01214266 0.01208639 0.01215744 0.01210952]
mean value: 0.01218712329864502
key: test_mcc
value: [0.946411 0.96396098 0.98181818 0.98181818 0.946411 0.98181211
0.91202529 0.98181211 1. 0.92905936]
mean value: 0.9625128216673053
key: train_mcc
value: [0.99593079 0.99593079 0.96789422 0.98382069 0.98784133 0.99796334
0.97981668 0.98582943 0.99593082 0.96592295]
mean value: 0.9856881042566935
key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569
0.95412844 0.99082569 1. 0.96330275]
mean value: 0.9807339449541285
key: train_accuracy
value: [0.99796126 0.99796126 0.98369011 0.99184506 0.99388379 0.99898063
0.98980632 0.99286442 0.99796126 0.98267074]
mean value: 0.9927624872579001
key: test_fscore
value: [0.97297297 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099
0.95652174 0.99099099 1. 0.96491228]
mean value: 0.9812831505725088
key: train_fscore
value: [0.99796748 0.99796748 0.98396794 0.99191919 0.99392713 0.99898063
0.98989899 0.9929078 0.99796334 0.98294885]
mean value: 0.9928448822634005
key: test_precision
value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
0.91666667 0.98214286 1. 0.93220339]
mean value: 0.963581469081023
key: train_precision
value: [0.9959432 0.9959432 0.96844181 0.98396794 0.98792757 0.99796334
0.98 0.98591549 0.99593496 0.96646943]
mean value: 0.9858506946033496
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074
0.9537037 0.99074074 1. 0.96296296]
mean value: 0.9807239057239057
key: train_roc_auc
value: [0.99795918 0.99795918 0.98367347 0.99183673 0.99387755 0.99898167
0.9898167 0.99287169 0.99796334 0.98268839]
mean value: 0.9927627914709672
key: test_jcc
value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
0.91666667 0.98214286 1. 0.93220339]
mean value: 0.963581469081023
key: train_jcc
value: [0.9959432 0.9959432 0.96844181 0.98396794 0.98792757 0.99796334
0.98 0.98591549 0.99593496 0.96646943]
mean value: 0.9858506946033496
MCC on Blind test: 0.29
Accuracy on Blind test: 0.6
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.28126764 0.26553988 0.26455736 0.26455355 0.27714682 0.26926732
0.26987267 0.26936221 0.27386642 0.26629829]
mean value: 0.2701732158660889
key: score_time
value: [0.01591969 0.01593041 0.01601982 0.01616859 0.01612663 0.01737332
0.01592994 0.01732421 0.01614046 0.01601481]
mean value: 0.01629478931427002
key: test_mcc
value: [0.9291517 0.96396098 1. 1. 0.96396098 1.
0.96393713 0.98181211 1. 0.98181211]
mean value: 0.9784635026184957
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96330275 0.98165138 1. 1. 0.98165138 1.
0.98165138 0.99082569 1. 0.99082569]
mean value: 0.9889908256880734
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.98181818 1. 1. 0.98181818 1.
0.98214286 0.99099099 1. 0.99099099]
mean value: 0.9892046917046917
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.96428571 1. 1. 0.96428571 1.
0.96491228 0.98214286 1. 0.98214286]
mean value: 0.9788803906317518
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96363636 0.98181818 1. 1. 0.98181818 1.
0.98148148 0.99074074 1. 0.99074074]
mean value: 0.989023569023569
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.96428571 1. 1. 0.96428571 1.
0.96491228 0.98214286 1. 0.98214286]
mean value: 0.9788803906317518
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.59
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.17862988 0.20598102 0.20765114 0.20498371 0.1792326 0.2004025
0.20543003 0.20129752 0.20281935 0.20100617]
mean value: 0.1987433910369873
key: score_time
value: [0.02799296 0.04068851 0.04115677 0.04074216 0.04036856 0.03043032
0.03932166 0.04003978 0.03971243 0.0401566 ]
mean value: 0.038060975074768064
key: test_mcc
value: [0.91216737 0.96396098 1. 1. 0.98181818 1.
0.94635821 0.98181211 0.96393713 0.94635821]
mean value: 0.9696412205023645
key: train_mcc
value: [1. 0.99796333 0.99593079 0.99593079 0.99390234 0.99593082
0.99390241 0.99593082 0.99796334 0.9918781 ]
mean value: 0.9959332732076022
key: test_accuracy
value: [0.95412844 0.98165138 1. 1. 0.99082569 1.
0.97247706 0.99082569 0.98165138 0.97247706]
mean value: 0.9844036697247707
key: train_accuracy
value: [1. 0.99898063 0.99796126 0.99796126 0.9969419 0.99796126
0.9969419 0.99796126 0.99898063 0.99592253]
mean value: 0.9979612640163099
key: test_fscore
value: [0.95575221 0.98181818 1. 1. 0.99082569 1.
0.97345133 0.99099099 0.98214286 0.97345133]
mean value: 0.9848432585282061
key: train_fscore
value: [1. 0.99898271 0.99796748 0.99796748 0.99695431 0.99796334
0.99694812 0.99796334 0.99898063 0.99593496]
mean value: 0.9979662369680692
key: test_precision
value: [0.91525424 0.96428571 1. 1. 0.98181818 1.
0.94827586 0.98214286 0.96491228 0.94827586]
mean value: 0.9704964995374574
key: train_precision
value: [1. 0.99796748 0.9959432 0.9959432 0.99392713 0.99593496
0.99391481 0.99593496 0.99796334 0.99190283]
mean value: 0.9959431915048893
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.98181818 1. 1. 0.99090909 1.
0.97222222 0.99074074 0.98148148 0.97222222]
mean value: 0.9843939393939394
key: train_roc_auc
value: [1. 0.99897959 0.99795918 0.99795918 0.99693878 0.99796334
0.99694501 0.99796334 0.99898167 0.99592668]
mean value: 0.9979616775427075
key: test_jcc
value: [0.91525424 0.96428571 1. 1. 0.98181818 1.
0.94827586 0.98214286 0.96491228 0.94827586]
mean value: 0.9704964995374574
key: train_jcc
value: [1. 0.99796748 0.9959432 0.9959432 0.99392713 0.99593496
0.99391481 0.99593496 0.99796334 0.99190283]
mean value: 0.9959431915048893
MCC on Blind test: 0.25
Accuracy on Blind test: 0.57
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.47767258 0.4457314 0.63524914 0.67361498 0.61306643 0.56355143
0.53762197 0.56128907 0.43451262 0.60698318]
mean value: 0.5549292802810669
key: score_time
value: [0.02317858 0.026896 0.04104447 0.04110169 0.04912281 0.04197502
0.04222465 0.0552702 0.03481197 0.041291 ]
mean value: 0.0396916389465332
key: test_mcc
value: [0.946411 0.96396098 0.98181818 1. 0.98181818 1.
0.92905936 0.98181211 1. 0.98181211]
mean value: 0.9766691930577851
key: train_mcc
value: [0.99796333 0.99593079 0.99390234 0.99390234 0.99390234 0.99390241
0.99390241 0.99390241 0.99390241 0.99390241]
mean value: 0.9945113200343811
key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 1. 0.99082569 1.
0.96330275 0.99082569 1. 0.99082569]
mean value: 0.9880733944954129
key: train_accuracy
value: [0.99898063 0.99796126 0.9969419 0.9969419 0.9969419 0.9969419
0.9969419 0.9969419 0.9969419 0.9969419 ]
mean value: 0.9972477064220183
key: test_fscore
value: [0.97297297 0.98181818 0.99082569 1. 0.99082569 1.
0.96491228 0.99099099 1. 0.99099099]
mean value: 0.988333679362168
key: train_fscore
value: [0.99898271 0.99796748 0.99695431 0.99695431 0.99695431 0.99694812
0.99694812 0.99694812 0.99694812 0.99694812]
mean value: 0.9972553719869787
key: test_precision
value: [0.94736842 0.96428571 0.98181818 1. 0.98181818 1.
0.93220339 0.98214286 1. 0.98214286]
mean value: 0.9771779603090932
key: train_precision
value: [0.99796748 0.9959432 0.99392713 0.99392713 0.99392713 0.99391481
0.99391481 0.99391481 0.99391481 0.99391481]
mean value: 0.9945266097572326
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 1. 0.99090909 1.
0.96296296 0.99074074 1. 0.99074074]
mean value: 0.9880808080808081
key: train_roc_auc
value: [0.99897959 0.99795918 0.99693878 0.99693878 0.99693878 0.99694501
0.99694501 0.99694501 0.99694501 0.99694501]
mean value: 0.9972480152957314
key: test_jcc
value: [0.94736842 0.96428571 0.98181818 1. 0.98181818 1.
0.93220339 0.98214286 1. 0.98214286]
mean value: 0.9771779603090932
key: train_jcc
value: [0.99796748 0.9959432 0.99392713 0.99392713 0.99392713 0.99391481
0.99391481 0.99391481 0.99391481 0.99391481]
mean value: 0.9945266097572326
MCC on Blind test: 0.16
Accuracy on Blind test: 0.54
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.11533022 1.10044742 1.10230374 1.10466123 1.10110593 1.12116337
1.1251564 1.11318278 1.11536574 1.13156414]
mean value: 1.11302809715271
key: score_time
value: [0.00962567 0.00945687 0.00967073 0.00957179 0.00962257 0.00992775
0.010005 0.0098331 0.01052403 0.01036096]
mean value: 0.009859848022460937
key: test_mcc
value: [0.9291517 0.98181818 1. 0.98181818 0.946411 0.98181211
0.94635821 0.96393713 0.98181211 0.94635821]
mean value: 0.9659476849273602
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96330275 0.99082569 1. 0.99082569 0.97247706 0.99082569
0.97247706 0.98165138 0.99082569 0.97247706]
mean value: 0.9825688073394496
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.99082569 1. 0.99082569 0.97297297 0.99099099
0.97345133 0.98214286 0.99099099 0.97345133]
mean value: 0.9829937557397572
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.98181818 1. 0.98181818 0.94736842 0.98214286
0.94827586 0.96491228 0.98214286 0.94827586]
mean value: 0.9667788986573016
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96363636 0.99090909 1. 0.99090909 0.97272727 0.99074074
0.97222222 0.98148148 0.99074074 0.97222222]
mean value: 0.9825589225589225
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.98181818 1. 0.98181818 0.94736842 0.98214286
0.94827586 0.96491228 0.98214286 0.94827586]
mean value: 0.9667788986573016
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.59
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04244852 0.04812479 0.04577637 0.0463593 0.04909849 0.04620719
0.04703236 0.05215859 0.04779005 0.04598045]
mean value: 0.04709761142730713
key: score_time
value: [0.01303792 0.02137995 0.01299429 0.01325035 0.02061152 0.01301885
0.01324797 0.01297951 0.02275825 0.01454926]
mean value: 0.015782785415649415
key: test_mcc
value: [1. 1. 1. 1. 1. 1.
0.34851405 1. 1. 1. ]
mean value: 0.9348514050091998
key: train_mcc
value: [1. 1. 1. 1. 1. 0.96198449
0.35025059 1. 1. 1. ]
mean value: 0.9312235085917535
key: test_accuracy
value: [1. 1. 1. 1. 1. 1.
0.60550459 1. 1. 1. ]
mean value: 0.9605504587155963
key: train_accuracy
value: [1. 1. 1. 1. 1. 0.98063201
0.60958206 1. 1. 1. ]
mean value: 0.9590214067278288
key: test_fscore
value: [1. 1. 1. 1. 1. 1.
0.35820896 1. 1. 1. ]
mean value: 0.935820895522388
key: train_fscore
value: [1. 1. 1. 1. 1. 0.98022893
0.35845896 1. 1. 1. ]
mean value: 0.9338687889673829
key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1.
0.21818182 1. 1. 1. ]
mean value: 0.9218181818181819
key: train_recall
value: [1. 1. 1. 1. 1. 0.96122449
0.21836735 1. 1. 1. ]
mean value: 0.9179591836734694
key: test_roc_auc
value: [1. 1. 1. 1. 1. 1.
0.60909091 1. 1. 1. ]
mean value: 0.9609090909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 0.98061224
0.60918367 1. 1. 1. ]
mean value: 0.9589795918367346
key: test_jcc
value: [1. 1. 1. 1. 1. 1.
0.21818182 1. 1. 1. ]
mean value: 0.9218181818181819
key: train_jcc
value: [1. 1. 1. 1. 1. 0.96122449
0.21836735 1. 1. 1. ]
mean value: 0.9179591836734694
MCC on Blind test: 0.0
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02674341 0.03784299 0.04292107 0.01843095 0.02614713 0.0395968
0.01892185 0.01898193 0.01882291 0.02836633]
mean value: 0.027677536010742188
key: score_time
value: [0.01909041 0.01905203 0.01903725 0.01596856 0.01496267 0.0134511
0.01707458 0.01234651 0.0223639 0.02151322]
mean value: 0.017486023902893066
key: test_mcc
value: [0.946411 0.9291517 1. 0.98181818 0.946411 0.98181211
0.91202529 0.94635821 1. 0.91202529]
mean value: 0.9556012783200151
key: train_mcc
value: [0.96789422 0.96592058 0.96789422 0.96592058 0.96789422 0.96592295
0.96987347 0.96987347 0.96198744 0.97185442]
mean value: 0.9675035590265852
key: test_accuracy
value: [0.97247706 0.96330275 1. 0.99082569 0.97247706 0.99082569
0.95412844 0.97247706 1. 0.95412844]
mean value: 0.9770642201834863
key: train_accuracy
value: [0.98369011 0.98267074 0.98369011 0.98267074 0.98369011 0.98267074
0.98470948 0.98470948 0.98063201 0.98572885]
mean value: 0.9834862385321101
key: test_fscore
value: [0.97297297 0.96428571 1. 0.99082569 0.97297297 0.99099099
0.95652174 0.97345133 1. 0.95652174]
mean value: 0.9778543144990544
key: train_fscore
value: [0.98396794 0.98298298 0.98396794 0.98298298 0.98396794 0.98294885
0.98492462 0.98492462 0.98098098 0.98591549]
mean value: 0.9837564340290699
key: test_precision
value: [0.94736842 0.93103448 1. 0.98181818 0.94736842 0.98214286
0.91666667 0.94827586 1. 0.91666667]
mean value: 0.9571341559227221
key: train_precision
value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943
0.97029703 0.97029703 0.96267191 0.97222222]
mean value: 0.9680353925262213
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.96363636 1. 0.99090909 0.97272727 0.99074074
0.9537037 0.97222222 1. 0.9537037 ]
mean value: 0.977037037037037
key: train_roc_auc
value: [0.98367347 0.98265306 0.98367347 0.98265306 0.98367347 0.98268839
0.98472505 0.98472505 0.98065173 0.98574338]
mean value: 0.9834860135500229
key: test_jcc
value: [0.94736842 0.93103448 1. 0.98181818 0.94736842 0.98214286
0.91666667 0.94827586 1. 0.91666667]
mean value: 0.9571341559227221
key: train_jcc
value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943
0.97029703 0.97029703 0.96267191 0.97222222]
mean value: 0.9680353925262213
MCC on Blind test: 0.3
Accuracy on Blind test: 0.61
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.26458216 0.32423019 0.28895402 0.36766052 0.35149288 0.44386435
0.33424354 0.35568094 0.34150338 0.36406851]
mean value: 0.34362804889678955
key: score_time
value: [0.01919508 0.01234579 0.0192287 0.02447915 0.01916623 0.01922035
0.0209403 0.01921153 0.02182055 0.01927543]
mean value: 0.019488310813903807
key: test_mcc
value: [0.946411 0.9291517 1. 0.98181818 0.96396098 0.94635821
0.91202529 0.94635821 0.98181211 0.91202529]
mean value: 0.9519920982298953
key: train_mcc
value: [0.960022 0.96592058 0.956108 0.95806317 0.960022 0.96198744
0.96198744 0.96987347 0.95611192 0.96592295]
mean value: 0.961601897888028
key: test_accuracy
value: [0.97247706 0.96330275 1. 0.99082569 0.98165138 0.97247706
0.95412844 0.97247706 0.99082569 0.95412844]
mean value: 0.9752293577981652
key: train_accuracy
value: [0.97961264 0.98267074 0.9775739 0.97859327 0.97961264 0.98063201
0.98063201 0.98470948 0.9775739 0.98267074]
mean value: 0.980428134556575
key: test_fscore
value: [0.97297297 0.96428571 1. 0.99082569 0.98181818 0.97345133
0.95652174 0.97345133 0.99099099 0.95652174]
mean value: 0.9760839681269381
key: train_fscore
value: [0.98003992 0.98298298 0.97808765 0.97906281 0.98003992 0.98098098
0.98098098 0.98492462 0.97804391 0.98294885]
mean value: 0.9808092628062846
key: test_precision
value: [0.94736842 0.93103448 1. 0.98181818 0.96428571 0.94827586
0.91666667 0.94827586 0.98214286 0.91666667]
mean value: 0.953653471452927
key: train_precision
value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191
0.96267191 0.97029703 0.95703125 0.96646943]
mean value: 0.9623498450426142
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97272727 0.96363636 1. 0.99090909 0.98181818 0.97222222
0.9537037 0.97222222 0.99074074 0.9537037 ]
mean value: 0.9751683501683501
key: train_roc_auc
value: [0.97959184 0.98265306 0.97755102 0.97857143 0.97959184 0.98065173
0.98065173 0.98472505 0.97759674 0.98268839]
mean value: 0.9804272829294651
key: test_jcc
value: [0.94736842 0.93103448 1. 0.98181818 0.96428571 0.94827586
0.91666667 0.94827586 0.98214286 0.91666667]
mean value: 0.953653471452927
key: train_jcc
value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191
0.96267191 0.97029703 0.95703125 0.96646943]
mean value: 0.9623498450426142
MCC on Blind test: 0.3
Accuracy on Blind test: 0.61
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01922441 0.01973104 0.0211575 0.02013302 0.02118015 0.01990414
0.01957083 0.02115703 0.01995182 0.02084684]
mean value: 0.020285677909851075
key: score_time
value: [0.01134562 0.01130247 0.01128507 0.01133657 0.01124454 0.0112586
0.01131725 0.01126814 0.01121068 0.01123452]
mean value: 0.011280345916748046
key: test_mcc
value: [0.70710678 0.4472136 1. 0.70710678 0.70710678 1.
0.70710678 1. 0.33333333 0.33333333]
mean value: 0.6942307386912815
key: train_mcc
value: [0.96362411 1. 0.96362411 0.96362411 1. 0.96362411
0.96362411 0.96362411 0.96362411 1. ]
mean value: 0.9745368781616021
key: test_accuracy
value: [0.83333333 0.66666667 1. 0.83333333 0.83333333 1.
0.83333333 1. 0.66666667 0.66666667]
mean value: 0.8333333333333334
key: train_accuracy
value: [0.98148148 1. 0.98148148 0.98148148 1. 0.98148148
0.98148148 0.98148148 0.98148148 1. ]
mean value: 0.987037037037037
key: test_fscore
value: [0.8 0.75 1. 0.8 0.8 1.
0.85714286 1. 0.66666667 0.66666667]
mean value: 0.834047619047619
key: train_fscore
value: [0.98181818 1. 0.98181818 0.98181818 1. 0.98181818
0.98181818 0.98181818 0.98181818 1. ]
mean value: 0.9872727272727273
key: test_precision
value: [1. 0.6 1. 1. 1. 1.
0.75 1. 0.66666667 0.66666667]
mean value: 0.8683333333333333
key: train_precision
value: [0.96428571 1. 0.96428571 0.96428571 1. 0.96428571
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.975
key: test_recall
value: [0.66666667 1. 1. 0.66666667 0.66666667 1.
1. 1. 0.66666667 0.66666667]
mean value: 0.8333333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.66666667 1. 0.83333333 0.83333333 1.
0.83333333 1. 0.66666667 0.66666667]
mean value: 0.8333333333333334
key: train_roc_auc
value: [0.98148148 1. 0.98148148 0.98148148 1. 0.98148148
0.98148148 0.98148148 0.98148148 1. ]
mean value: 0.987037037037037
key: test_jcc
value: [0.66666667 0.6 1. 0.66666667 0.66666667 1.
0.75 1. 0.5 0.5 ]
mean value: 0.735
key: train_jcc
value: [0.96428571 1. 0.96428571 0.96428571 1. 0.96428571
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.975
Traceback (most recent call last):
File "/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py", line 165, in <module>
mm_skf_scoresD4 = MultModelsCl(input_df = X_rus
File "/home/tanu/git/LSHTM_analysis/scripts/ml/MultModelsCl.py", line 240, in MultModelsCl
bts_predict = model_pipeline.predict(blind_test_input_df)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
Xt = transform.transform(Xt)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
Xs = self._fit_transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
return Parallel(n_jobs=self.n_jobs)(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
res = transformer.transform(X)
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
X_int, X_mask = self._transform(
File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
raise ValueError(msg)
ValueError: Found unknown categories ['Other'] in column 5 during transform