LSHTM_analysis/scripts/ml/log_embb_sl.txt

19739 lines
974 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 858
PASS: my_features_df and aa_df successfully combined
nrows: 858
ncols: 269
count of NULL values before imputation
or_mychisq 244
log10_or_mychisq 244
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 168
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 175
-------------------------------------------------------------
Successfully split data according to scaling law: 1/np.sqrt(x_ncols)
Train data size: (414, 175)
Test data size: 0.07559289460184544 (34, 175)
y_train numbers: Counter({0: 326, 1: 88})
y_train ratio: 3.7045454545454546
y_test_numbers: Counter({0: 27, 1: 7})
y_test ratio: 3.857142857142857
-------------------------------------------------------------
Simple Random OverSampling
Counter({1: 326, 0: 326})
(652, 175)
Simple Random UnderSampling
Counter({0: 88, 1: 88})
(176, 175)
Simple Combined Over and UnderSampling
Counter({0: 326, 1: 326})
(652, 175)
SMOTE_NC OverSampling
Counter({1: 326, 0: 326})
(652, 175)
#####################################################################
Running ML analysis: scaling law split
Gene name: embB
Drug name: ethambutol
Output directory: /home/tanu/git/Data/ethambutol/output/ml/tts_sl/
Sanity checks:
ML source data size: (448, 175)
Total input features: (414, 175)
Target feature numbers: Counter({0: 326, 1: 88})
Target features ratio: 3.7045454545454546
#####################################################################
================================================================
Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.13825727 0.08154607 0.08528066 0.09961534 0.08779621 0.08618045
0.11637926 0.07789564 0.0799129 0.07683444]
mean value: 0.09296982288360596
key: score_time
value: [0.03777742 0.02564454 0.02300692 0.02812481 0.03288937 0.0238626
0.02518201 0.02394128 0.02401376 0.04178333]
mean value: 0.02862260341644287
key: test_mcc
value: [0.54494926 0.6333005 0.6333005 0.85634884 0.52265422 0.77972283
0.6989826 0.6310315 0.57066443 0.66678841]
mean value: 0.6537743098920646
key: train_mcc
value: [0.82700789 0.82580084 0.80837281 0.79999931 0.80849337 0.79973104
0.7997743 0.80849337 0.81082175 0.79307146]
mean value: 0.8081566136778171
key: test_accuracy
value: [0.85714286 0.88095238 0.88095238 0.95238095 0.85365854 0.92682927
0.90243902 0.87804878 0.87804878 0.90243902]
mean value: 0.8912891986062718
key: train_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.94354839 0.94354839 0.93817204 0.93548387 0.9383378 0.93565684
0.93565684 0.9383378 0.9383378 0.93297587]
mean value: 0.9380055637233705
key: test_fscore
value: [0.625 0.70588235 0.70588235 0.875 0.57142857 0.82352941
0.75 0.70588235 0.54545455 0.71428571]
mean value: 0.7022345301757067
key: train_fscore
value: [0.86092715 0.85714286 0.84137931 0.83561644 0.84137931 0.83333333
0.82857143 0.84137931 0.84563758 0.82758621]
mean value: 0.8412952931545316
key: test_precision
value: [0.71428571 0.75 0.75 1. 0.8 0.875
0.85714286 0.75 1. 0.83333333]
mean value: 0.8329761904761905
key: train_precision
value: [0.90277778 0.92647059 0.92424242 0.91044776 0.92424242 0.92307692
0.95081967 0.92424242 0.91304348 0.92307692]
mean value: 0.9222440396480238
key: test_recall
value: [0.55555556 0.66666667 0.66666667 0.77777778 0.44444444 0.77777778
0.66666667 0.66666667 0.375 0.625 ]
mean value: 0.6222222222222222
key: train_recall
value: [0.82278481 0.79746835 0.7721519 0.7721519 0.7721519 0.75949367
0.73417722 0.7721519 0.7875 0.75 ]
mean value: 0.774003164556962
key: test_roc_auc
value: [0.74747475 0.8030303 0.8030303 0.88888889 0.70659722 0.87326389
0.81770833 0.80208333 0.6875 0.79734848]
mean value: 0.7926925505050505
key: train_roc_auc
value: [0.89944701 0.89020175 0.87754353 0.87583704 0.87757255 0.87124343
0.86198657 0.87757255 0.88351109 0.86646758]
mean value: 0.8781383100071152
key: test_jcc
value: [0.45454545 0.54545455 0.54545455 0.77777778 0.4 0.7
0.6 0.54545455 0.375 0.55555556]
mean value: 0.5499242424242424
key: train_jcc
value: [0.75581395 0.75 0.72619048 0.71764706 0.72619048 0.71428571
0.70731707 0.72619048 0.73255814 0.70588235]
mean value: 0.7262075720815836
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.85975575 1.78022003 1.30109572 1.05650544 1.55232549 0.93596625
0.97297597 0.95046806 0.9523139 1.00821924]
mean value: 1.2369845867156983
key: score_time
value: [0.02304935 0.02481604 0.01518774 0.01267743 0.0152185 0.01517177
0.01260853 0.01517558 0.01544213 0.02388382]
mean value: 0.01732308864593506
key: test_mcc
value: [0.54494926 0.6333005 0.71717172 0.92884073 0.54237994 0.77972283
0.6140038 0.6593092 0.75691259 0.75691259]
mean value: 0.6933503152621142
key: train_mcc
value: [0.96003759 0.89382193 0.95178641 0.96785761 0.96788082 0.98394041
0.90199832 0.97605278 0.8701771 0.96817406]
mean value: 0.9441727034440436
key: test_accuracy
value: [0.85714286 0.88095238 0.9047619 0.97619048 0.85365854 0.92682927
0.87804878 0.87804878 0.92682927 0.92682927]
mean value: 0.9009291521486643
key: train_accuracy
value: [0.98655914 0.96505376 0.98387097 0.98924731 0.98927614 0.99463807
0.96782842 0.9919571 0.95710456 0.98927614]
mean value: 0.9814811611750123
key: test_fscore
value: [0.625 0.70588235 0.77777778 0.94117647 0.625 0.82352941
0.66666667 0.73684211 0.76923077 0.76923077]
mean value: 0.7440336323463259
key: train_fscore
value: [0.96855346 0.91503268 0.96202532 0.97468354 0.97468354 0.98734177
0.92105263 0.98113208 0.8961039 0.975 ]
mean value: 0.955560891922779
key: test_precision
value: [0.71428571 0.75 0.77777778 1. 0.71428571 0.875
0.83333333 0.7 1. 1. ]
mean value: 0.836468253968254
key: train_precision
value: [0.9625 0.94594595 0.96202532 0.97468354 0.97468354 0.98734177
0.95890411 0.975 0.93243243 0.975 ]
mean value: 0.9648516665182609
key: test_recall
value: [0.55555556 0.66666667 0.77777778 0.88888889 0.55555556 0.77777778
0.55555556 0.77777778 0.625 0.625 ]
mean value: 0.6805555555555556
key: train_recall
value: [0.97468354 0.88607595 0.96202532 0.97468354 0.97468354 0.98734177
0.88607595 0.98734177 0.8625 0.975 ]
mean value: 0.9470411392405064
key: test_roc_auc
value: [0.74747475 0.8030303 0.85858586 0.94444444 0.74652778 0.87326389
0.76215278 0.84201389 0.8125 0.8125 ]
mean value: 0.8202493686868687
key: train_roc_auc
value: [0.98222232 0.93621204 0.9758932 0.9839288 0.98394041 0.99197021
0.93793593 0.99026953 0.92271758 0.98408703]
mean value: 0.9689177045834534
key: test_jcc
value: [0.45454545 0.54545455 0.63636364 0.88888889 0.45454545 0.7
0.5 0.58333333 0.625 0.625 ]
mean value: 0.6013131313131312
key: train_jcc
value: [0.93902439 0.84337349 0.92682927 0.95061728 0.95061728 0.975
0.85365854 0.96296296 0.81176471 0.95121951]
mean value: 0.9165067438039527
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01652694 0.01299787 0.01034498 0.01023841 0.01024342 0.01018882
0.01015377 0.0114944 0.0101347 0.01052117]
mean value: 0.011284446716308594
key: score_time
value: [0.01544619 0.00955415 0.0094564 0.00937772 0.00940585 0.00933957
0.00916338 0.00932145 0.00951695 0.00933099]
mean value: 0.009991264343261719
key: test_mcc
value: [0.40344549 0.69727705 0.56883128 0.74471985 0.04600437 0.39091152
0.4768306 0.56541479 0.58026353 0.39267774]
mean value: 0.486637623356521
key: train_mcc
value: [0.69866809 0.65278249 0.66216187 0.652554 0.52389431 0.53613836
0.6098212 0.64733091 0.62649878 0.65101357]
mean value: 0.6260863587107378
key: test_accuracy
value: [0.76190476 0.88095238 0.83333333 0.9047619 0.51219512 0.70731707
0.80487805 0.82926829 0.85365854 0.73170732]
mean value: 0.7819976771196283
key: train_accuracy
value: [0.88709677 0.8655914 0.8655914 0.86290323 0.73726542 0.74798928
0.84450402 0.87935657 0.85254692 0.86595174]
mean value: 0.8408796736717692
key: test_fscore
value: [0.54545455 0.76190476 0.66666667 0.8 0.33333333 0.53846154
0.6 0.66666667 0.66666667 0.52173913]
mean value: 0.6100893309588962
key: train_fscore
value: [0.76404494 0.72826087 0.73404255 0.72727273 0.608 0.61788618
0.69473684 0.72392638 0.70899471 0.72826087]
mean value: 0.7035426073744735
key: test_precision
value: [0.46153846 0.66666667 0.58333333 0.72727273 0.23809524 0.41176471
0.54545455 0.58333333 0.6 0.4 ]
mean value: 0.5217459011576658
key: train_precision
value: [0.68686869 0.63809524 0.63302752 0.62962963 0.44444444 0.45508982
0.59459459 0.70238095 0.6146789 0.64423077]
mean value: 0.6043040557621945
key: test_recall
value: [0.66666667 0.88888889 0.77777778 0.88888889 0.55555556 0.77777778
0.66666667 0.77777778 0.75 0.75 ]
mean value: 0.75
key: train_recall
value: [0.86075949 0.84810127 0.87341772 0.86075949 0.96202532 0.96202532
0.83544304 0.74683544 0.8375 0.8375 ]
mean value: 0.8624367088607595
key: test_roc_auc
value: [0.72727273 0.88383838 0.81313131 0.8989899 0.52777778 0.73263889
0.75520833 0.81076389 0.81439394 0.73863636]
mean value: 0.7702651515151515
key: train_roc_auc
value: [0.87747872 0.85920422 0.86844948 0.86212036 0.81944803 0.82625075
0.84119091 0.83090071 0.84707765 0.85561007]
mean value: 0.8487730896350418
key: test_jcc
value: [0.375 0.61538462 0.5 0.66666667 0.2 0.36842105
0.42857143 0.5 0.5 0.35294118]
mean value: 0.4506984939724878
key: train_jcc
value: [0.61818182 0.57264957 0.57983193 0.57142857 0.43678161 0.44705882
0.53225806 0.56730769 0.54918033 0.57264957]
mean value: 0.5447327985100132
MCC on Blind test: 0.52
Accuracy on Blind test: 0.82
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01086617 0.01176739 0.01034856 0.01103044 0.01118231 0.01112795
0.0103054 0.01032543 0.01044059 0.0103581 ]
mean value: 0.010775232315063476
key: score_time
value: [0.01022363 0.01046634 0.00995755 0.01052999 0.00978065 0.00941157
0.00928855 0.00924921 0.00914741 0.00955892]
mean value: 0.009761381149291991
key: test_mcc
value: [ 0.07784989 0.42358687 0.15151515 0.28426762 -0.12009612 0.6989826
0.48234017 0.54237994 0.00458735 0.25295146]
mean value: 0.27983649480059836
key: train_mcc
value: [0.47627568 0.38128402 0.47252142 0.49088992 0.46564258 0.4066905
0.45377606 0.41364491 0.5140712 0.45094947]
mean value: 0.4525745767981807
key: test_accuracy
value: [0.71428571 0.83333333 0.71428571 0.78571429 0.73170732 0.90243902
0.82926829 0.85365854 0.73170732 0.80487805]
mean value: 0.7901277584204414
key: train_accuracy
value: [0.83870968 0.81182796 0.83870968 0.84139785 0.83646113 0.82037534
0.82841823 0.8230563 0.84718499 0.82841823]
mean value: 0.8314559370405604
key: test_fscore
value: [0.25 0.46153846 0.33333333 0.4 0. 0.75
0.58823529 0.625 0.15384615 0.33333333]
mean value: 0.3895286576168929
key: train_fscore
value: [0.56521739 0.48529412 0.55882353 0.58156028 0.55474453 0.5037037
0.55555556 0.50746269 0.6013986 0.54929577]
mean value: 0.5463056169471472
key: test_precision
value: [0.28571429 0.75 0.33333333 0.5 0. 0.85714286
0.625 0.71428571 0.2 0.5 ]
mean value: 0.47654761904761905
key: train_precision
value: [0.66101695 0.57894737 0.66666667 0.66129032 0.65517241 0.60714286
0.61538462 0.61818182 0.68253968 0.62903226]
mean value: 0.6375374951927499
key: test_recall
value: [0.22222222 0.33333333 0.33333333 0.33333333 0. 0.66666667
0.55555556 0.55555556 0.125 0.25 ]
mean value: 0.3375
key: train_recall
value: [0.49367089 0.41772152 0.48101266 0.51898734 0.48101266 0.43037975
0.50632911 0.43037975 0.5375 0.4875 ]
mean value: 0.4784493670886076
key: test_roc_auc
value: [0.53535354 0.65151515 0.57575758 0.62121212 0.46875 0.81770833
0.73090278 0.74652778 0.50189394 0.59469697]
mean value: 0.6244318181818181
key: train_roc_auc
value: [0.71270575 0.66790513 0.70808312 0.72365749 0.70649272 0.67777491
0.71064755 0.67947559 0.73462031 0.70450085]
mean value: 0.7025863422009405
key: test_jcc
value: [0.14285714 0.3 0.2 0.25 0. 0.6
0.41666667 0.45454545 0.08333333 0.2 ]
mean value: 0.2647402597402597
key: train_jcc
value: [0.39393939 0.32038835 0.3877551 0.41 0.38383838 0.33663366
0.38461538 0.34 0.43 0.37864078]
mean value: 0.3765811054013908
MCC on Blind test: 0.53
Accuracy on Blind test: 0.85
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00983429 0.01149964 0.00986004 0.01124573 0.01079226 0.0097723
0.00985765 0.01113725 0.00995946 0.01075101]
mean value: 0.010470962524414063
key: score_time
value: [0.05714202 0.0184226 0.01779532 0.01783371 0.01803923 0.01747918
0.01673388 0.01775336 0.01698089 0.01827788]
mean value: 0.02164580821990967
key: test_mcc
value: [-0.11677484 0.29904999 0.29904999 0.29904999 0.07726439 0.34258008
0.07726439 0.2981424 0.32113081 0.32113081]
mean value: 0.22178879980661068
key: train_mcc
value: [0.48553706 0.41208648 0.41419284 0.38566308 0.46210103 0.43818223
0.46195215 0.43833981 0.42126307 0.44605107]
mean value: 0.4365368805751352
key: test_accuracy
value: [0.73809524 0.80952381 0.80952381 0.80952381 0.75609756 0.80487805
0.75609756 0.80487805 0.82926829 0.82926829]
mean value: 0.7947154471544715
key: train_accuracy
value: [0.84946237 0.83333333 0.83333333 0.82795699 0.84450402 0.83914209
0.84450402 0.83914209 0.83378016 0.83914209]
mean value: 0.8384300498717173
key: test_fscore
value: [0. 0.2 0.2 0.2 0.16666667 0.42857143
0.16666667 0.2 0.22222222 0.22222222]
mean value: 0.2006349206349206
key: train_fscore
value: [0.49090909 0.41509434 0.43636364 0.38461538 0.46296296 0.45454545
0.47272727 0.42307692 0.42592593 0.44444444]
mean value: 0.4410665435193737
key: test_precision
value: [0. 1. 1. 1. 0.33333333 0.6
0.33333333 1. 1. 1. ]
mean value: 0.7266666666666667
key: train_precision
value: [0.87096774 0.81481481 0.77419355 0.8 0.86206897 0.80645161
0.83870968 0.88 0.82142857 0.85714286]
mean value: 0.8325777789548646
key: test_recall
value: [0. 0.11111111 0.11111111 0.11111111 0.11111111 0.33333333
0.11111111 0.11111111 0.125 0.125 ]
mean value: 0.125
key: train_recall
value: [0.34177215 0.27848101 0.30379747 0.25316456 0.3164557 0.3164557
0.32911392 0.27848101 0.2875 0.3 ]
mean value: 0.30052215189873416
key: test_roc_auc
value: [0.46969697 0.55555556 0.55555556 0.55555556 0.52430556 0.63541667
0.52430556 0.55555556 0.5625 0.5625 ]
mean value: 0.5500946969696969
key: train_roc_auc
value: [0.66406014 0.63070808 0.63995334 0.61804986 0.65142513 0.64802377
0.65605356 0.63413847 0.63521758 0.64317406]
mean value: 0.6420803975346565
key: test_jcc
value: [0. 0.11111111 0.11111111 0.11111111 0.09090909 0.27272727
0.09090909 0.11111111 0.125 0.125 ]
mean value: 0.11489898989898989
key: train_jcc
value: [0.3253012 0.26190476 0.27906977 0.23809524 0.30120482 0.29411765
0.30952381 0.26829268 0.27058824 0.28571429]
mean value: 0.28338124520561114
MCC on Blind test: 0.2
Accuracy on Blind test: 0.76
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01600242 0.01665425 0.01603627 0.01654625 0.01582694 0.0164063
0.01608539 0.01628256 0.01777649 0.01702571]
mean value: 0.01646425724029541
key: score_time
value: [0.01065445 0.01069617 0.01087236 0.01136184 0.01069951 0.0107069
0.01067758 0.01079345 0.01084089 0.01085567]
mean value: 0.010815882682800293
key: test_mcc
value: [0.225913 0.30577621 0.42358687 0.42817442 0.15345615 0.52265422
0.22280797 0.30353867 0.32113081 0.66779184]
mean value: 0.35748301719076425
key: train_mcc
value: [0.6529723 0.61806561 0.61615709 0.6081474 0.67251663 0.60634012
0.66480183 0.67316974 0.6098845 0.65819069]
mean value: 0.6380245913294944
key: test_accuracy
value: [0.78571429 0.80952381 0.83333333 0.83333333 0.7804878 0.85365854
0.7804878 0.80487805 0.82926829 0.90243902]
mean value: 0.8213124274099884
key: train_accuracy
value: [0.89247312 0.88172043 0.88172043 0.87903226 0.89812332 0.87935657
0.89544236 0.89812332 0.87935657 0.89276139]
mean value: 0.8878109775433135
key: test_fscore
value: [0.30769231 0.33333333 0.46153846 0.36363636 0.18181818 0.57142857
0.30769231 0.33333333 0.22222222 0.66666667]
mean value: 0.3749361749361749
key: train_fscore
value: [0.69230769 0.62068966 0.62711864 0.60869565 0.703125 0.61538462
0.68292683 0.6984127 0.62809917 0.68253968]
mean value: 0.6559299642880824
key: test_precision
value: [0.5 0.66666667 0.75 1. 0.5 0.8
0.5 0.66666667 1. 1. ]
mean value: 0.7383333333333333
key: train_precision
value: [0.88235294 0.97297297 0.94871795 0.97222222 0.91836735 0.94736842
0.95454545 0.93617021 0.92682927 0.93478261]
mean value: 0.9394329397380768
key: test_recall
value: [0.22222222 0.22222222 0.33333333 0.22222222 0.11111111 0.44444444
0.22222222 0.22222222 0.125 0.5 ]
mean value: 0.2625
key: train_recall
value: [0.56962025 0.4556962 0.46835443 0.44303797 0.56962025 0.4556962
0.53164557 0.55696203 0.475 0.5375 ]
mean value: 0.5063132911392405
key: test_roc_auc
value: [0.58080808 0.5959596 0.65151515 0.61111111 0.53993056 0.70659722
0.57986111 0.59548611 0.5625 0.75 ]
mean value: 0.617376893939394
key: train_roc_auc
value: [0.77457122 0.72614162 0.73076425 0.7198125 0.77800741 0.72444674
0.76242142 0.77337897 0.73238055 0.76363055]
mean value: 0.7485555218436794
key: test_jcc
value: [0.18181818 0.2 0.3 0.22222222 0.1 0.4
0.18181818 0.2 0.125 0.5 ]
mean value: 0.2410858585858586
key: train_jcc
value: [0.52941176 0.45 0.45679012 0.4375 0.54216867 0.44444444
0.51851852 0.53658537 0.45783133 0.51807229]
mean value: 0.48913225061359206
MCC on Blind test: 0.49
Accuracy on Blind test: 0.85
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.7838552 1.77208853 1.77244234 2.05708766 1.60403633 2.0858171
1.80553317 2.03801513 2.35855365 1.82596135]
mean value: 1.9103390455245972
key: score_time
value: [0.02115822 0.01863265 0.01833153 0.01555753 0.01932693 0.01274967
0.01269507 0.01553988 0.02369905 0.01336861]
mean value: 0.017105913162231444
key: test_mcc
value: [0.43434343 0.6617241 0.71717172 0.85634884 0.52265422 0.6593092
0.44728753 0.60982417 0.75691259 0.53409091]
mean value: 0.6199666720136415
key: train_mcc
value: [0.9839288 0.97603535 0.97603535 0.9839288 0.98394041 0.98394041
0.97605278 0.98394041 0.97626913 0.97626913]
mean value: 0.9800340580889745
key: test_accuracy
value: [0.80952381 0.88095238 0.9047619 0.95238095 0.85365854 0.87804878
0.82926829 0.85365854 0.92682927 0.85365854]
mean value: 0.874274099883856
key: train_accuracy
value: [0.99462366 0.99193548 0.99193548 0.99462366 0.99463807 0.99463807
0.9919571 0.99463807 0.9919571 0.9919571 ]
mean value: 0.9932903802358096
key: test_fscore
value: [0.55555556 0.73684211 0.77777778 0.875 0.57142857 0.73684211
0.53333333 0.7 0.76923077 0.625 ]
mean value: 0.6881010217852324
key: train_fscore
value: [0.98734177 0.98113208 0.98113208 0.98734177 0.98734177 0.98734177
0.98113208 0.98734177 0.98136646 0.98136646]
mean value: 0.9842838006429246
key: test_precision
value: [0.55555556 0.7 0.77777778 1. 0.8 0.7
0.66666667 0.63636364 1. 0.625 ]
mean value: 0.7461363636363636
key: train_precision
value: [0.98734177 0.975 0.975 0.98734177 0.98734177 0.98734177
0.975 0.98734177 0.97530864 0.97530864]
mean value: 0.9812326144710111
key: test_recall
value: [0.55555556 0.77777778 0.77777778 0.77777778 0.44444444 0.77777778
0.44444444 0.77777778 0.625 0.625 ]
mean value: 0.6583333333333333
key: train_recall
value: [0.98734177 0.98734177 0.98734177 0.98734177 0.98734177 0.98734177
0.98734177 0.98734177 0.9875 0.9875 ]
mean value: 0.987373417721519
key: test_roc_auc
value: [0.71717172 0.84343434 0.85858586 0.88888889 0.70659722 0.84201389
0.69097222 0.82638889 0.8125 0.76704545]
mean value: 0.7953598484848484
key: train_roc_auc
value: [0.9919644 0.99025792 0.99025792 0.9919644 0.99197021 0.99197021
0.99026953 0.99197021 0.99033703 0.99033703]
mean value: 0.9911298840830668
key: test_jcc
value: [0.38461538 0.58333333 0.63636364 0.77777778 0.4 0.58333333
0.36363636 0.53846154 0.625 0.45454545]
mean value: 0.5347066822066822
key: train_jcc
value: [0.975 0.96296296 0.96296296 0.975 0.975 0.975
0.96296296 0.975 0.96341463 0.96341463]
mean value: 0.9690718157181571
MCC on Blind test: 0.61
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02745533 0.02765441 0.02012897 0.02177501 0.01911521 0.01608849
0.01911664 0.01851773 0.02057862 0.02146387]
mean value: 0.021189427375793456
key: score_time
value: [0.01580095 0.00977087 0.01018167 0.00925803 0.00895476 0.00922942
0.00897217 0.00940061 0.00989866 0.00921059]
mean value: 0.01006777286529541
key: test_mcc
value: [0.85858586 0.78107061 0.74471985 0.71717172 0.77972283 0.54237994
0.79652583 0.6310315 0.84091787 0.92155559]
mean value: 0.7613681612927914
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95238095 0.92857143 0.9047619 0.9047619 0.92682927 0.85365854
0.92682927 0.87804878 0.95121951 0.97560976]
mean value: 0.920267131242741
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.82352941 0.8 0.77777778 0.82352941 0.625
0.84210526 0.70588235 0.85714286 0.93333333]
mean value: 0.807718929677134
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.875 0.72727273 0.77777778 0.875 0.71428571
0.8 0.75 1. 1. ]
mean value: 0.8408225108225108
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.77777778 0.88888889 0.77777778 0.77777778 0.55555556
0.88888889 0.66666667 0.75 0.875 ]
mean value: 0.7847222222222222
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92929293 0.87373737 0.8989899 0.85858586 0.87326389 0.74652778
0.91319444 0.80208333 0.875 0.9375 ]
mean value: 0.8708175505050505
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.7 0.66666667 0.63636364 0.7 0.45454545
0.72727273 0.54545455 0.75 0.875 ]
mean value: 0.685530303030303
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11748314 0.1138792 0.11257935 0.118783 0.11806083 0.11977959
0.12217593 0.11842275 0.1105082 0.118788 ]
mean value: 0.11704599857330322
key: score_time
value: [0.01931214 0.01800871 0.01792955 0.01963234 0.01855159 0.01916122
0.01873064 0.01944232 0.01761746 0.02170444]
mean value: 0.019009041786193847
key: test_mcc
value: [0.28426762 0.42358687 0.54494926 0.5247362 0.30353867 0.6310315
0.34258008 0.6140038 0.46037165 0.46037165]
mean value: 0.4589437292532702
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78571429 0.83333333 0.85714286 0.85714286 0.80487805 0.87804878
0.80487805 0.87804878 0.85365854 0.85365854]
mean value: 0.8406504065040651
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.4 0.46153846 0.625 0.57142857 0.33333333 0.70588235
0.42857143 0.66666667 0.5 0.5 ]
mean value: 0.5192420814479638
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.75 0.71428571 0.8 0.66666667 0.75
0.6 0.83333333 0.75 0.75 ]
mean value: 0.7114285714285714
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.33333333 0.33333333 0.55555556 0.44444444 0.22222222 0.66666667
0.33333333 0.55555556 0.375 0.375 ]
mean value: 0.41944444444444445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.62121212 0.65151515 0.74747475 0.70707071 0.59548611 0.80208333
0.63541667 0.76215278 0.67234848 0.67234848]
mean value: 0.6867108585858586
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.25 0.3 0.45454545 0.4 0.2 0.54545455
0.27272727 0.5 0.33333333 0.33333333]
mean value: 0.35893939393939395
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0111711 0.01040745 0.01136565 0.01079869 0.01098847 0.01044989
0.01094103 0.01001263 0.00999689 0.01113796]
mean value: 0.01072697639465332
key: score_time
value: [0.00921154 0.01034403 0.00983167 0.00920534 0.00949502 0.00927043
0.00908327 0.00910687 0.00952053 0.00939202]
mean value: 0.009446072578430175
key: test_mcc
value: [-0.04713417 0.43434343 0.18349396 0.18999015 0.00347222 0.4768306
0.28057127 0.36369648 0.39639387 0.39267774]
mean value: 0.26743355609325364
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.80952381 0.69047619 0.73809524 0.65853659 0.80487805
0.7804878 0.73170732 0.7804878 0.73170732]
mean value: 0.7344947735191638
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.2 0.55555556 0.38095238 0.35294118 0.22222222 0.6
0.4 0.52173913 0.52631579 0.52173913]
mean value: 0.42814653855439966
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.18181818 0.55555556 0.33333333 0.375 0.22222222 0.54545455
0.5 0.42857143 0.45454545 0.4 ]
mean value: 0.39965007215007214
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.22222222 0.55555556 0.44444444 0.33333333 0.22222222 0.66666667
0.33333333 0.66666667 0.625 0.75 ]
mean value: 0.48194444444444445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47474747 0.71717172 0.6010101 0.59090909 0.50173611 0.75520833
0.61979167 0.70833333 0.72159091 0.73863636]
mean value: 0.64291351010101
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.11111111 0.38461538 0.23529412 0.21428571 0.125 0.42857143
0.25 0.35294118 0.35714286 0.35294118]
mean value: 0.2811902966314731
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.75
Accuracy on Blind test: 0.91
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.72934055 1.83378863 1.77105832 1.85764003 1.78461933 1.79943681
1.72764802 1.76972437 1.89383507 1.90175414]
mean value: 1.806884527206421
key: score_time
value: [0.10036635 0.10115433 0.10233259 0.09189773 0.10074592 0.13073897
0.10669971 0.10550141 0.10591745 0.10528898]
mean value: 0.10506434440612793
key: test_mcc
value: [0.70391441 0.85858586 0.71717172 0.78107061 0.42139769 0.77972283
0.71527778 0.6989826 0.66779184 0.84091787]
mean value: 0.7184833212496842
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.95238095 0.9047619 0.92857143 0.82926829 0.92682927
0.90243902 0.90243902 0.90243902 0.95121951]
mean value: 0.9105110336817654
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71428571 0.88888889 0.77777778 0.82352941 0.46153846 0.82352941
0.77777778 0.75 0.66666667 0.85714286]
mean value: 0.7541136967607556
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 0.77777778 0.875 0.75 0.875
0.77777778 0.85714286 1. 1. ]
mean value: 0.8801587301587301
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.55555556 0.88888889 0.77777778 0.77777778 0.33333333 0.77777778
0.77777778 0.66666667 0.5 0.75 ]
mean value: 0.6805555555555556
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77777778 0.92929293 0.85858586 0.87373737 0.65104167 0.87326389
0.85763889 0.81770833 0.75 0.875 ]
mean value: 0.8264046717171717
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55555556 0.8 0.63636364 0.7 0.3 0.7
0.63636364 0.6 0.5 0.75 ]
mean value: 0.6178282828282828
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [2.1396358 1.28942776 1.19340086 1.24784899 1.68375421 1.17906475
1.00073361 1.00206161 2.3973825 0.93770719]
mean value: 1.4071017265319825
key: score_time
value: [0.15786266 0.15287995 0.15035939 0.17096448 0.15879273 0.12753606
0.13729715 0.14079833 0.24412775 0.17703223]
mean value: 0.16176507472991944
key: test_mcc
value: [0.42817442 0.78107061 0.61591318 0.78173596 0.30353867 0.6989826
0.52265422 0.52265422 0.45993311 0.57066443]
mean value: 0.5685321428161713
key: train_mcc
value: [0.90193168 0.9186072 0.92675267 0.9184817 0.91853696 0.90194648
0.92687724 0.92680291 0.91111239 0.8946528 ]
mean value: 0.9145702019299435
key: test_accuracy
value: [0.83333333 0.92857143 0.88095238 0.92857143 0.80487805 0.90243902
0.85365854 0.85365854 0.85365854 0.87804878]
mean value: 0.8717770034843205
key: train_accuracy
value: [0.96774194 0.97311828 0.97580645 0.97311828 0.97319035 0.96782842
0.97587131 0.97587131 0.97050938 0.96514745]
mean value: 0.9718203176799561
key: test_fscore
value: [0.36363636 0.82352941 0.66666667 0.8 0.33333333 0.75
0.57142857 0.57142857 0.4 0.54545455]
mean value: 0.5825477463712758
key: train_fscore
value: [0.92105263 0.93333333 0.94117647 0.93421053 0.93421053 0.92
0.94039735 0.94117647 0.92810458 0.91390728]
mean value: 0.9307569169645318
key: test_precision
value: [1. 0.875 0.83333333 1. 0.66666667 0.85714286
0.8 0.8 1. 1. ]
mean value: 0.8832142857142857
key: train_precision
value: [0.95890411 0.98591549 0.97297297 0.97260274 0.97260274 0.97183099
0.98611111 0.97297297 0.97260274 0.97183099]
mean value: 0.9738346850612913
key: test_recall
value: [0.22222222 0.77777778 0.55555556 0.66666667 0.22222222 0.66666667
0.44444444 0.44444444 0.25 0.375 ]
mean value: 0.46249999999999997
key: train_recall
value: [0.88607595 0.88607595 0.91139241 0.89873418 0.89873418 0.87341772
0.89873418 0.91139241 0.8875 0.8625 ]
mean value: 0.8914556962025316
key: test_roc_auc
value: [0.61111111 0.87373737 0.76262626 0.83333333 0.59548611 0.81770833
0.70659722 0.70659722 0.625 0.6875 ]
mean value: 0.7219696969696969
key: train_roc_auc
value: [0.93791852 0.94133149 0.95228323 0.94595412 0.94596573 0.9333075
0.94766641 0.95229484 0.94033703 0.92783703]
mean value: 0.9424895903408237
key: test_jcc
value: [0.22222222 0.7 0.5 0.66666667 0.2 0.6
0.4 0.4 0.25 0.375 ]
mean value: 0.4313888888888889
key: train_jcc
value: [0.85365854 0.875 0.88888889 0.87654321 0.87654321 0.85185185
0.8875 0.88888889 0.86585366 0.84146341]
mean value: 0.8706191659138813
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02546477 0.00988579 0.00996375 0.01107383 0.01002979 0.00996566
0.0111177 0.01110387 0.01114964 0.01115584]
mean value: 0.012091064453125
key: score_time
value: [0.00998831 0.00883961 0.00892806 0.0097239 0.00888944 0.00888157
0.00988293 0.00979996 0.00983691 0.0098424 ]
mean value: 0.009461307525634765
key: test_mcc
value: [ 0.07784989 0.42358687 0.15151515 0.28426762 -0.12009612 0.6989826
0.48234017 0.54237994 0.00458735 0.25295146]
mean value: 0.27983649480059836
key: train_mcc
value: [0.47627568 0.38128402 0.47252142 0.49088992 0.46564258 0.4066905
0.45377606 0.41364491 0.5140712 0.45094947]
mean value: 0.4525745767981807
key: test_accuracy
value: [0.71428571 0.83333333 0.71428571 0.78571429 0.73170732 0.90243902
0.82926829 0.85365854 0.73170732 0.80487805]
mean value: 0.7901277584204414
key: train_accuracy
value: [0.83870968 0.81182796 0.83870968 0.84139785 0.83646113 0.82037534
0.82841823 0.8230563 0.84718499 0.82841823]
mean value: 0.8314559370405604
key: test_fscore
value: [0.25 0.46153846 0.33333333 0.4 0. 0.75
0.58823529 0.625 0.15384615 0.33333333]
mean value: 0.3895286576168929
key: train_fscore
value: [0.56521739 0.48529412 0.55882353 0.58156028 0.55474453 0.5037037
0.55555556 0.50746269 0.6013986 0.54929577]
mean value: 0.5463056169471472
key: test_precision
value: [0.28571429 0.75 0.33333333 0.5 0. 0.85714286
0.625 0.71428571 0.2 0.5 ]
mean value: 0.47654761904761905
key: train_precision
value: [0.66101695 0.57894737 0.66666667 0.66129032 0.65517241 0.60714286
0.61538462 0.61818182 0.68253968 0.62903226]
mean value: 0.6375374951927499
key: test_recall
value: [0.22222222 0.33333333 0.33333333 0.33333333 0. 0.66666667
0.55555556 0.55555556 0.125 0.25 ]
mean value: 0.3375
key: train_recall
value: [0.49367089 0.41772152 0.48101266 0.51898734 0.48101266 0.43037975
0.50632911 0.43037975 0.5375 0.4875 ]
mean value: 0.4784493670886076
key: test_roc_auc
value: [0.53535354 0.65151515 0.57575758 0.62121212 0.46875 0.81770833
0.73090278 0.74652778 0.50189394 0.59469697]
mean value: 0.6244318181818181
key: train_roc_auc
value: [0.71270575 0.66790513 0.70808312 0.72365749 0.70649272 0.67777491
0.71064755 0.67947559 0.73462031 0.70450085]
mean value: 0.7025863422009405
key: test_jcc
value: [0.14285714 0.3 0.2 0.25 0. 0.6
0.41666667 0.45454545 0.08333333 0.2 ]
mean value: 0.2647402597402597
key: train_jcc
value: [0.39393939 0.32038835 0.3877551 0.41 0.38383838 0.33663366
0.38461538 0.34 0.43 0.37864078]
mean value: 0.3765811054013908
MCC on Blind test: 0.53
Accuracy on Blind test: 0.85
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.57921863 1.53817844 4.57131052 5.65595579 4.97904253 5.14335322
4.43232751 4.83747458 4.0643115 4.83004928]
mean value: 4.163122200965882
key: score_time
value: [0.01270485 0.01288128 0.04290867 0.02396774 0.02589917 0.01918221
0.01912308 0.02631402 0.03656936 0.01719379]
mean value: 0.023674416542053222
key: test_mcc
value: [0.93419873 0.93419873 0.79796142 0.87669552 0.85763889 0.77972283
0.85763889 0.85763889 0.92155559 1. ]
mean value: 0.8817249498343762
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97619048 0.97619048 0.92857143 0.95238095 0.95121951 0.92682927
0.95121951 0.95121951 0.97560976 1. ]
mean value: 0.9589430894308943
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.94736842 0.84210526 0.9 0.88888889 0.82352941
0.88888889 0.88888889 0.93333333 1. ]
mean value: 0.9060371517027864
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.9 0.8 0.81818182 0.88888889 0.875
0.88888889 0.88888889 1. 1. ]
mean value: 0.8959848484848485
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.88888889 1. 0.88888889 0.77777778
0.88888889 0.88888889 0.875 1. ]
mean value: 0.9208333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.98484848 0.91414141 0.96969697 0.92881944 0.87326389
0.92881944 0.92881944 0.9375 1. ]
mean value: 0.9450757575757576
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.9 0.72727273 0.81818182 0.8 0.7
0.8 0.8 0.875 1. ]
mean value: 0.8320454545454545
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06765056 0.08285642 0.09254718 0.08614326 0.05079794 0.08059192
0.08231091 0.09519434 0.09031606 0.09135342]
mean value: 0.08197619915008544
key: score_time
value: [0.03701258 0.02388382 0.02445459 0.01250029 0.01238322 0.013134
0.0275991 0.02205539 0.02407956 0.02290916]
mean value: 0.022001171112060548
key: test_mcc
value: [0.54494926 0.70064905 0.57575758 0.87669552 0.71527778 0.6310315
0.54237994 0.65168169 0.75691259 0.75691259]
mean value: 0.6752247502781483
key: train_mcc
value: [0.92049683 0.93642283 0.92049683 0.91204431 0.95238371 0.95238371
0.9364694 0.93576165 0.92126621 0.91290316]
mean value: 0.9300628649101216
key: test_accuracy
value: [0.85714286 0.9047619 0.85714286 0.95238095 0.90243902 0.87804878
0.85365854 0.85365854 0.92682927 0.92682927]
mean value: 0.8912891986062718
key: train_accuracy
value: [0.97311828 0.97849462 0.97311828 0.97043011 0.98391421 0.98391421
0.97855228 0.97855228 0.97319035 0.97050938]
mean value: 0.976379399809738
key: test_fscore
value: [0.625 0.75 0.66666667 0.9 0.77777778 0.70588235
0.625 0.72727273 0.76923077 0.76923077]
mean value: 0.7316061063119887
key: train_fscore
value: [0.9375 0.95 0.9375 0.93081761 0.9625 0.9625
0.95 0.94936709 0.9382716 0.93167702]
mean value: 0.94501333222423
key: test_precision
value: [0.71428571 0.85714286 0.66666667 0.81818182 0.77777778 0.75
0.71428571 0.61538462 1. 1. ]
mean value: 0.7913725163725164
key: train_precision
value: [0.92592593 0.9382716 0.92592593 0.925 0.95061728 0.95061728
0.9382716 0.94936709 0.92682927 0.92592593]
mean value: 0.9356751912455833
key: test_recall
value: [0.55555556 0.66666667 0.66666667 1. 0.77777778 0.66666667
0.55555556 0.88888889 0.625 0.625 ]
mean value: 0.7027777777777777
key: train_recall
value: [0.94936709 0.96202532 0.94936709 0.93670886 0.97468354 0.97468354
0.96202532 0.94936709 0.95 0.9375 ]
mean value: 0.9545727848101265
key: test_roc_auc
value: [0.74747475 0.81818182 0.78787879 0.96969697 0.85763889 0.80208333
0.74652778 0.86631944 0.8125 0.8125 ]
mean value: 0.8220801767676768
key: train_roc_auc
value: [0.96444464 0.97248024 0.96444464 0.95811552 0.98053905 0.98053905
0.97250926 0.96788082 0.96476109 0.95851109]
mean value: 0.9684225396967444
key: test_jcc
value: [0.45454545 0.6 0.5 0.81818182 0.63636364 0.54545455
0.45454545 0.57142857 0.625 0.625 ]
mean value: 0.583051948051948
key: train_jcc
value: [0.88235294 0.9047619 0.88235294 0.87058824 0.92771084 0.92771084
0.9047619 0.90361446 0.88372093 0.87209302]
mean value: 0.8959668025237554
MCC on Blind test: 0.91
Accuracy on Blind test: 0.97
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01700091 0.01041293 0.00945973 0.00938082 0.00980783 0.00951076
0.0096736 0.00959182 0.00992084 0.00989866]
mean value: 0.010465788841247558
key: score_time
value: [0.01056051 0.00879884 0.0088098 0.00861311 0.00867081 0.00858402
0.00899649 0.00893307 0.00900865 0.00894117]
mean value: 0.008991646766662597
key: test_mcc
value: [0.28426762 0.70064905 0.5247362 0.70391441 0.30353867 0.57291667
0.28057127 0.48234017 0.33432866 0.31852949]
mean value: 0.45057922034163905
key: train_mcc
value: [0.57647565 0.48411535 0.54824666 0.52726386 0.5561788 0.56401034
0.53807155 0.54319113 0.57376465 0.55865576]
mean value: 0.546997375710397
key: test_accuracy
value: [0.78571429 0.9047619 0.85714286 0.9047619 0.80487805 0.85365854
0.7804878 0.82926829 0.82926829 0.80487805]
mean value: 0.8354819976771196
key: train_accuracy
value: [0.86827957 0.84408602 0.86021505 0.85483871 0.86327078 0.86595174
0.85790885 0.86058981 0.86595174 0.86327078]
mean value: 0.8604363054570613
key: test_fscore
value: [0.4 0.75 0.57142857 0.71428571 0.33333333 0.66666667
0.4 0.58823529 0.36363636 0.42857143]
mean value: 0.5216157372039725
key: train_fscore
value: [0.64748201 0.56060606 0.62318841 0.60294118 0.62773723 0.63235294
0.61313869 0.6119403 0.64788732 0.62773723]
mean value: 0.6195011359575966
key: test_precision
value: [0.5 0.85714286 0.8 1. 0.66666667 0.66666667
0.5 0.625 0.66666667 0.5 ]
mean value: 0.6782142857142857
key: train_precision
value: [0.75 0.69811321 0.72881356 0.71929825 0.74137931 0.75438596
0.72413793 0.74545455 0.74193548 0.75438596]
mean value: 0.7357904213012624
key: test_recall
value: [0.33333333 0.66666667 0.44444444 0.55555556 0.22222222 0.66666667
0.33333333 0.55555556 0.25 0.375 ]
mean value: 0.44027777777777777
key: train_recall
value: [0.56962025 0.46835443 0.5443038 0.51898734 0.5443038 0.5443038
0.53164557 0.51898734 0.575 0.5375 ]
mean value: 0.5353006329113924
key: test_roc_auc
value: [0.62121212 0.81818182 0.70707071 0.77777778 0.59548611 0.78645833
0.61979167 0.73090278 0.60984848 0.64204545]
mean value: 0.6908775252525252
key: train_roc_auc
value: [0.75921286 0.70687346 0.74484814 0.73218992 0.74664169 0.74834237
0.7386119 0.73568415 0.76019625 0.74485922]
mean value: 0.7417459956830186
key: test_jcc
value: [0.25 0.6 0.4 0.55555556 0.2 0.5
0.25 0.41666667 0.22222222 0.27272727]
mean value: 0.3667171717171717
key: train_jcc
value: [0.4787234 0.38947368 0.45263158 0.43157895 0.45744681 0.46236559
0.44210526 0.44086022 0.47916667 0.45744681]
mean value: 0.4491798968079086
MCC on Blind test: 0.61
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01704407 0.02029037 0.02038693 0.01810551 0.04003406 0.0388875
0.03933382 0.04400635 0.04520512 0.04177046]
mean value: 0.03250641822814941
key: score_time
value: [0.00974464 0.01208925 0.0128572 0.01228189 0.03262424 0.02173567
0.02031255 0.02697802 0.02192736 0.01659703]
mean value: 0.0187147855758667
key: test_mcc
value: [0.61591318 0.42358687 0.6617241 0.531085 0.52265422 0.6989826
0.6989826 0.60982417 0.66779184 0.66779184]
mean value: 0.6098336450543114
key: train_mcc
value: [0.92737397 0.79965711 0.90577461 0.7203473 0.74864658 0.80846528
0.83432653 0.90882828 0.76678827 0.86966315]
mean value: 0.8289871078025104
key: test_accuracy
value: [0.88095238 0.83333333 0.88095238 0.85714286 0.85365854 0.90243902
0.90243902 0.85365854 0.90243902 0.90243902]
mean value: 0.876945412311266
key: train_accuracy
value: [0.97580645 0.93548387 0.96774194 0.91129032 0.91957105 0.9383378
0.9463807 0.96782842 0.92493298 0.95710456]
mean value: 0.9444478076623714
key: test_fscore
value: [0.66666667 0.46153846 0.73684211 0.5 0.57142857 0.75
0.75 0.7 0.66666667 0.66666667]
mean value: 0.646980913823019
key: train_fscore
value: [0.94267516 0.82857143 0.92592593 0.74418605 0.76923077 0.83687943
0.86111111 0.92771084 0.8 0.89333333]
mean value: 0.8529624049917472
key: test_precision
value: [0.83333333 0.75 0.7 1. 0.8 0.85714286
0.85714286 0.63636364 1. 1. ]
mean value: 0.8433982683982684
key: train_precision
value: [0.94871795 0.95081967 0.90361446 0.96 0.98039216 0.9516129
0.95384615 0.88505747 0.93333333 0.95714286]
mean value: 0.9424536954355686
key: test_recall
value: [0.55555556 0.33333333 0.77777778 0.33333333 0.44444444 0.66666667
0.66666667 0.77777778 0.5 0.5 ]
mean value: 0.5555555555555556
key: train_recall
value: [0.93670886 0.73417722 0.94936709 0.60759494 0.63291139 0.74683544
0.78481013 0.97468354 0.7 0.8375 ]
mean value: 0.7904588607594937
key: test_roc_auc
value: [0.76262626 0.65151515 0.84343434 0.66666667 0.70659722 0.81770833
0.81770833 0.82638889 0.75 0.75 ]
mean value: 0.7592645202020202
key: train_roc_auc
value: [0.96152849 0.86196915 0.96103167 0.8003845 0.81475502 0.86831568
0.88730302 0.97033497 0.84317406 0.91363055]
mean value: 0.88824271077723
key: test_jcc
value: [0.5 0.3 0.58333333 0.33333333 0.4 0.6
0.6 0.53846154 0.5 0.5 ]
mean value: 0.4855128205128205
key: train_jcc
value: [0.89156627 0.70731707 0.86206897 0.59259259 0.625 0.7195122
0.75609756 0.86516854 0.66666667 0.80722892]
mean value: 0.7493218774093527
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01887393 0.04717493 0.04252505 0.03427696 0.0314486 0.04632211
0.04026818 0.03947592 0.04457188 0.04065919]
mean value: 0.038559675216674805
key: score_time
value: [0.0130012 0.02130556 0.02607536 0.03163552 0.02629685 0.02147627
0.0198648 0.02625942 0.020051 0.02602553]
mean value: 0.02319915294647217
key: test_mcc
value: [0.48076851 0.57575758 0.54494926 0.70391441 0.57291667 0.57587571
0.30353867 0.69492321 0.32113081 0.75798945]
mean value: 0.553176426951692
key: train_mcc
value: [0.89063353 0.95234883 0.87735655 0.85137594 0.94408056 0.70871454
0.64739598 0.89729558 0.68729186 0.91651264]
mean value: 0.8373005994934117
key: test_accuracy
value: [0.80952381 0.85714286 0.85714286 0.9047619 0.85365854 0.80487805
0.80487805 0.87804878 0.82926829 0.92682927]
mean value: 0.8526132404181185
key: train_accuracy
value: [0.95967742 0.98387097 0.95967742 0.9516129 0.98123324 0.86327078
0.89008043 0.96514745 0.90080429 0.97050938]
mean value: 0.9425884286084927
key: test_fscore
value: [0.6 0.66666667 0.625 0.71428571 0.66666667 0.66666667
0.33333333 0.76190476 0.22222222 0.8 ]
mean value: 0.6056746031746032
key: train_fscore
value: [0.9122807 0.9625 0.90196078 0.87837838 0.95597484 0.75598086
0.65546218 0.91925466 0.70866142 0.93413174]
mean value: 0.8584585565566628
key: test_precision
value: [0.54545455 0.66666667 0.71428571 1. 0.66666667 0.53333333
0.66666667 0.66666667 1. 0.85714286]
mean value: 0.7316883116883117
key: train_precision
value: [0.84782609 0.95061728 0.93243243 0.94202899 0.95 0.60769231
0.975 0.90243902 0.95744681 0.89655172]
mean value: 0.8962034653577938
key: test_recall
value: [0.66666667 0.66666667 0.55555556 0.55555556 0.66666667 0.88888889
0.22222222 0.88888889 0.125 0.75 ]
mean value: 0.5986111111111111
key: train_recall
value: [0.98734177 0.97468354 0.87341772 0.82278481 0.96202532 1.
0.49367089 0.93670886 0.5625 0.975 ]
mean value: 0.8588132911392405
key: test_roc_auc
value: [0.75757576 0.78787879 0.74747475 0.77777778 0.78645833 0.83506944
0.59548611 0.88194444 0.5625 0.85984848]
mean value: 0.7592013888888889
key: train_roc_auc
value: [0.9697801 0.98051583 0.92817644 0.90456647 0.97420994 0.91326531
0.74513476 0.95474899 0.77783703 0.97214164]
mean value: 0.9120376501898984
key: test_jcc
value: [0.42857143 0.5 0.45454545 0.55555556 0.5 0.5
0.2 0.61538462 0.125 0.66666667]
mean value: 0.45457237207237206
key: train_jcc
value: [0.83870968 0.92771084 0.82142857 0.78313253 0.91566265 0.60769231
0.4875 0.85057471 0.54878049 0.87640449]
mean value: 0.7657596275467198
MCC on Blind test: 0.69
Accuracy on Blind test: 0.85
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.21693301 0.32577515 0.22722745 0.24830532 0.21669579 0.21797705
0.21818995 0.21728969 0.21963811 0.21825624]
mean value: 0.23262877464294435
key: score_time
value: [0.02566838 0.02426243 0.02471066 0.02141333 0.02163672 0.0219543
0.02171063 0.02121949 0.02128983 0.02129984]
mean value: 0.022516560554504395
key: test_mcc
value: [1. 0.93419873 0.79796142 0.87669552 0.77972283 0.77972283
0.93374247 0.85763889 0.92155559 1. ]
mean value: 0.8881238291389542
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.97619048 0.92857143 0.95238095 0.92682927 0.92682927
0.97560976 0.95121951 0.97560976 1. ]
mean value: 0.9613240418118467
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94736842 0.84210526 0.9 0.82352941 0.82352941
0.94736842 0.88888889 0.93333333 1. ]
mean value: 0.9106123151014792
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9 0.8 0.81818182 0.875 0.875
0.9 0.88888889 1. 1. ]
mean value: 0.9057070707070707
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.88888889 1. 0.77777778 0.77777778
1. 0.88888889 0.875 1. ]
mean value: 0.9208333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98484848 0.91414141 0.96969697 0.87326389 0.87326389
0.984375 0.92881944 0.9375 1. ]
mean value: 0.946590909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.9 0.72727273 0.81818182 0.7 0.7
0.9 0.8 0.875 1. ]
mean value: 0.8420454545454545
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.07412887 0.06783414 0.06378627 0.0729394 0.0683949 0.07100201
0.08421254 0.08958602 0.08135438 0.08704138]
mean value: 0.07602798938751221
key: score_time
value: [0.02370763 0.02277517 0.02455759 0.02361369 0.02364612 0.02368784
0.02302051 0.02426982 0.02486539 0.02800369]
mean value: 0.024214744567871094
key: test_mcc
value: [0.78107061 0.85858586 0.79796142 0.93419873 0.85763889 0.85763889
0.77972283 0.77972283 0.84091787 1. ]
mean value: 0.8487457933134953
key: train_mcc
value: [0.98420082 0.96768499 0.99203311 1. 0.98391963 0.98421232
0.98394041 0.98394041 0.98408703 0.99211062]
mean value: 0.9856129343024994
key: test_accuracy
value: [0.92857143 0.95238095 0.92857143 0.97619048 0.95121951 0.95121951
0.92682927 0.92682927 0.95121951 1. ]
mean value: 0.9493031358885018
key: train_accuracy
value: [0.99462366 0.98924731 0.99731183 1. 0.99463807 0.99463807
0.99463807 0.99463807 0.99463807 0.99731903]
mean value: 0.9951692179076941
key: test_fscore
value: [0.82352941 0.88888889 0.84210526 0.94736842 0.88888889 0.88888889
0.82352941 0.82352941 0.85714286 1. ]
mean value: 0.8783871443314168
key: train_fscore
value: [0.9875 0.97435897 0.99371069 1. 0.98717949 0.9875
0.98734177 0.98734177 0.9875 0.99378882]
mean value: 0.9886221517541935
key: test_precision
value: [0.875 0.88888889 0.8 0.9 0.88888889 0.88888889
0.875 0.875 1. 1. ]
mean value: 0.8991666666666667
key: train_precision
value: [0.97530864 0.98701299 0.9875 1. 1. 0.97530864
0.98734177 0.98734177 0.9875 0.98765432]
mean value: 0.9874968136255056
key: test_recall
value: [0.77777778 0.88888889 0.88888889 1. 0.88888889 0.88888889
0.77777778 0.77777778 0.75 1. ]
mean value: 0.8638888888888889
key: train_recall
value: [1. 0.96202532 1. 1. 0.97468354 1.
0.98734177 0.98734177 0.9875 1. ]
mean value: 0.9898892405063291
key: test_roc_auc
value: [0.87373737 0.92929293 0.91414141 0.98484848 0.92881944 0.92881944
0.87326389 0.87326389 0.875 1. ]
mean value: 0.9181186868686868
key: train_roc_auc
value: [0.99658703 0.97930617 0.99829352 1. 0.98734177 0.99659864
0.99197021 0.99197021 0.99204352 0.99829352]
mean value: 0.9932404573593381
key: test_jcc
value: [0.7 0.8 0.72727273 0.9 0.8 0.8
0.7 0.7 0.75 1. ]
mean value: 0.7877272727272727
key: train_jcc
value: [0.97530864 0.95 0.9875 1. 0.97468354 0.97530864
0.975 0.975 0.97530864 0.98765432]
mean value: 0.9775763791217378
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17093611 0.23733187 0.23660564 0.1976974 0.24697614 0.21775317
0.2305522 0.14449883 0.21218038 0.22564006]
mean value: 0.21201717853546143
key: score_time
value: [0.03977919 0.03364158 0.03135276 0.03109956 0.04041839 0.03328681
0.03462434 0.03831649 0.03966093 0.04471827]
mean value: 0.03668982982635498
key: test_mcc
value: [-0.11677484 0.29904999 0.02823912 0.29904999 0.22280797 0.30353867
0.22280797 0.15345615 0. 0.17421709]
mean value: 0.15863921022151561
key: train_mcc
value: [0.90262581 0.91083126 0.91083126 0.90262581 0.91088732 0.88621978
0.90268621 0.89446407 0.90363564 0.90363564]
mean value: 0.9028442808292062
key: test_accuracy
value: [0.73809524 0.80952381 0.73809524 0.80952381 0.7804878 0.80487805
0.7804878 0.7804878 0.80487805 0.80487805]
mean value: 0.7851335656213705
key: train_accuracy
value: [0.96774194 0.97043011 0.97043011 0.96774194 0.97050938 0.96246649
0.96782842 0.96514745 0.96782842 0.96782842]
mean value: 0.9677952665109977
key: test_fscore
value: [0. 0.2 0.15384615 0.2 0.30769231 0.33333333
0.30769231 0.18181818 0. 0.2 ]
mean value: 0.1884382284382284
key: train_fscore
value: [0.91780822 0.92517007 0.92517007 0.91780822 0.92517007 0.90277778
0.91780822 0.91034483 0.91891892 0.91891892]
mean value: 0.9179895304817701
key: test_precision
value: [0. 1. 0.25 1. 0.5 0.66666667
0.5 0.5 0. 0.5 ]
mean value: 0.49166666666666664
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0.11111111 0.11111111 0.11111111 0.22222222 0.22222222
0.22222222 0.11111111 0. 0.125 ]
mean value: 0.1236111111111111
key: train_recall
value: [0.84810127 0.86075949 0.86075949 0.84810127 0.86075949 0.82278481
0.84810127 0.83544304 0.85 0.85 ]
mean value: 0.8484810126582278
key: test_roc_auc
value: [0.46969697 0.55555556 0.51010101 0.55555556 0.57986111 0.59548611
0.57986111 0.53993056 0.5 0.54734848]
mean value: 0.5433396464646465
key: train_roc_auc
value: [0.92405063 0.93037975 0.93037975 0.92405063 0.93037975 0.91139241
0.92405063 0.91772152 0.925 0.925 ]
mean value: 0.9242405063291139
key: test_jcc
value: [0. 0.11111111 0.08333333 0.11111111 0.18181818 0.2
0.18181818 0.1 0. 0.11111111]
mean value: 0.10803030303030303
key: train_jcc
value: [0.84810127 0.86075949 0.86075949 0.84810127 0.86075949 0.82278481
0.84810127 0.83544304 0.85 0.85 ]
mean value: 0.8484810126582278
MCC on Blind test: 0.27
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.9040575 0.74663472 0.82253218 0.82421541 0.84631276 0.84147477
0.57430267 0.56891108 0.59025025 0.56744289]
mean value: 0.7286134243011475
key: score_time
value: [0.01396084 0.01353455 0.01468349 0.01511073 0.01360822 0.00991464
0.00947881 0.00922632 0.00964713 0.00963688]
mean value: 0.011880159378051758
key: test_mcc
value: [0.93419873 0.93419873 0.79796142 0.87669552 0.85763889 0.71527778
0.6310315 0.77972283 0.92155559 0.92155559]
mean value: 0.8369836591412185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97619048 0.97619048 0.92857143 0.95238095 0.95121951 0.90243902
0.87804878 0.92682927 0.97560976 0.97560976]
mean value: 0.9443089430894309
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.94736842 0.84210526 0.9 0.88888889 0.77777778
0.70588235 0.82352941 0.93333333 0.93333333]
mean value: 0.8699587203302374
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.9 0.8 0.81818182 0.88888889 0.77777778
0.75 0.875 1. 1. ]
mean value: 0.8709848484848485
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.88888889 1. 0.88888889 0.77777778
0.66666667 0.77777778 0.875 0.875 ]
mean value: 0.875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.98484848 0.91414141 0.96969697 0.92881944 0.85763889
0.80208333 0.87326389 0.9375 0.9375 ]
mean value: 0.9190340909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.9 0.72727273 0.81818182 0.8 0.63636364
0.54545455 0.7 0.875 0.875 ]
mean value: 0.7777272727272727
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03040314 0.02820063 0.0325284 0.02842927 0.05756855 0.05813336
0.02869725 0.0279367 0.03112125 0.03650689]
mean value: 0.03595254421234131
key: score_time
value: [0.01271629 0.01393723 0.01270533 0.01889229 0.02039051 0.02033591
0.01599193 0.01624131 0.03544235 0.01930523]
mean value: 0.01859583854675293
key: test_mcc
value: [-0.23354968 0.02823912 -0.16943475 -0.16943475 0.07726439 -0.08385255
-0.08385255 -0.17437146 0. -0.11149893]
mean value: -0.0920491158947519
key: train_mcc
value: [0.24657858 0. 0.17364717 0.17364717 0.14164073 0.14164073
0.10002041 0.20085236 0.09922414 0.17232512]
mean value: 0.14495764120810134
key: test_accuracy
value: [0.61904762 0.73809524 0.69047619 0.69047619 0.75609756 0.75609756
0.75609756 0.68292683 0.80487805 0.75609756]
mean value: 0.7250290360046457
key: train_accuracy
value: [0.80376344 0.78763441 0.79569892 0.79569892 0.79356568 0.79356568
0.79088472 0.79892761 0.78820375 0.79356568]
mean value: 0.7941508835653953
key: test_fscore
value: [0. 0.15384615 0. 0. 0.16666667 0.
0. 0. 0. 0. ]
mean value: 0.03205128205128205
key: train_fscore
value: [0.14117647 0. 0.07317073 0.07317073 0.04938272 0.04938272
0.025 0.09638554 0.02469136 0.07228916]
mean value: 0.0604649422921507
key: test_precision
value: [0. 0.25 0. 0. 0.33333333 0.
0. 0. 0. 0. ]
mean value: 0.058333333333333334
key: train_precision
value: [1. 0. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 0.9
key: test_recall
value: [0. 0.11111111 0. 0. 0.11111111 0.
0. 0. 0. 0. ]
mean value: 0.02222222222222222
key: train_recall
value: [0.07594937 0. 0.03797468 0.03797468 0.02531646 0.02531646
0.01265823 0.05063291 0.0125 0.0375 ]
mean value: 0.03158227848101266
key: test_roc_auc
value: [0.39393939 0.51010101 0.43939394 0.43939394 0.52430556 0.484375
0.484375 0.4375 0.5 0.46969697]
mean value: 0.46830808080808084
key: train_roc_auc
value: [0.53797468 0.5 0.51898734 0.51898734 0.51265823 0.51265823
0.50632911 0.52531646 0.50625 0.51875 ]
mean value: 0.5157911392405063
key: test_jcc
value: [0. 0.08333333 0. 0. 0.09090909 0.
0. 0. 0. 0. ]
mean value: 0.017424242424242425
key: train_jcc
value: [0.07594937 0. 0.03797468 0.03797468 0.02531646 0.02531646
0.01265823 0.05063291 0.0125 0.0375 ]
mean value: 0.03158227848101266
MCC on Blind test: 0.34
Accuracy on Blind test: 0.82
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02315354 0.01530075 0.03732443 0.03772879 0.03844643 0.03985286
0.04924798 0.0379231 0.03757787 0.0376873 ]
mean value: 0.035424304008483884
key: score_time
value: [0.01255631 0.01216888 0.02392626 0.02346683 0.02286768 0.02408695
0.02419138 0.02475119 0.02242708 0.02230811]
mean value: 0.02127506732940674
key: test_mcc
value: [0.54494926 0.61591318 0.71717172 0.78107061 0.77972283 0.77972283
0.6140038 0.52209256 0.75691259 0.84091787]
mean value: 0.6952477252887368
key: train_mcc
value: [0.89508125 0.89433543 0.90279179 0.87803336 0.89440973 0.88593688
0.89440973 0.88593688 0.87860668 0.87860668]
mean value: 0.8888148407176151
key: test_accuracy
value: [0.85714286 0.88095238 0.9047619 0.92857143 0.92682927 0.92682927
0.87804878 0.82926829 0.92682927 0.95121951]
mean value: 0.9010452961672474
key: train_accuracy
value: [0.96505376 0.96505376 0.96774194 0.95967742 0.96514745 0.96246649
0.96514745 0.96246649 0.95978552 0.95978552]
mean value: 0.9632325809334371
key: test_fscore
value: [0.625 0.66666667 0.77777778 0.82352941 0.82352941 0.82352941
0.66666667 0.63157895 0.76923077 0.85714286]
mean value: 0.7464651920147276
key: train_fscore
value: [0.91719745 0.91612903 0.92307692 0.90322581 0.91612903 0.90909091
0.91612903 0.90909091 0.90322581 0.90322581]
mean value: 0.9116520709617073
key: test_precision
value: [0.71428571 0.83333333 0.77777778 0.875 0.875 0.875
0.83333333 0.6 1. 1. ]
mean value: 0.8383730158730158
key: train_precision
value: [0.92307692 0.93421053 0.93506494 0.92105263 0.93421053 0.93333333
0.93421053 0.93333333 0.93333333 0.93333333]
mean value: 0.9315159402001507
key: test_recall
value: [0.55555556 0.55555556 0.77777778 0.77777778 0.77777778 0.77777778
0.55555556 0.66666667 0.625 0.75 ]
mean value: 0.6819444444444445
key: train_recall
value: [0.91139241 0.89873418 0.91139241 0.88607595 0.89873418 0.88607595
0.89873418 0.88607595 0.875 0.875 ]
mean value: 0.8927215189873418
key: test_roc_auc
value: [0.74747475 0.76262626 0.85858586 0.87373737 0.87326389 0.87326389
0.76215278 0.77083333 0.8125 0.875 ]
mean value: 0.8209438131313131
key: train_roc_auc
value: [0.94545729 0.94083467 0.94716378 0.93279907 0.94086369 0.93453457
0.94086369 0.93453457 0.92896758 0.92896758]
mean value: 0.9374986480962108
key: test_jcc
value: [0.45454545 0.5 0.63636364 0.7 0.7 0.7
0.5 0.46153846 0.625 0.75 ]
mean value: 0.6027447552447552
key: train_jcc
value: [0.84705882 0.8452381 0.85714286 0.82352941 0.8452381 0.83333333
0.8452381 0.83333333 0.82352941 0.82352941]
mean value: 0.8377170868347339
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.35854936 0.34097719 0.34985304 0.48166299 0.38937902 0.36557841
0.3499918 0.3412137 0.34975195 0.44435978]
mean value: 0.377131724357605
key: score_time
value: [0.02343154 0.02412605 0.02369404 0.0236845 0.02388668 0.02413487
0.02367187 0.02401352 0.02389383 0.02092838]
mean value: 0.023546528816223145
key: test_mcc
value: [0.54494926 0.61591318 0.71717172 0.78107061 0.77972283 0.77972283
0.6140038 0.56541479 0.75691259 0.84091787]
mean value: 0.6995799479936676
key: train_mcc
value: [0.89508125 0.89433543 0.90279179 0.87803336 0.89440973 0.88593688
0.89440973 0.93576165 0.87860668 0.87860668]
mean value: 0.8937973177316136
key: test_accuracy
value: [0.85714286 0.88095238 0.9047619 0.92857143 0.92682927 0.92682927
0.87804878 0.82926829 0.92682927 0.95121951]
mean value: 0.9010452961672474
key: train_accuracy
value: [0.96505376 0.96505376 0.96774194 0.95967742 0.96514745 0.96246649
0.96514745 0.97855228 0.95978552 0.95978552]
mean value: 0.964841160021909
key: test_fscore
value: [0.625 0.66666667 0.77777778 0.82352941 0.82352941 0.82352941
0.66666667 0.66666667 0.76923077 0.85714286]
mean value: 0.7499739639445522
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.91719745 0.91612903 0.92307692 0.90322581 0.91612903 0.90909091
0.91612903 0.94936709 0.90322581 0.90322581]
mean value: 0.9156796889133758
key: test_precision
value: [0.71428571 0.83333333 0.77777778 0.875 0.875 0.875
0.83333333 0.58333333 1. 1. ]
mean value: 0.8367063492063492
key: train_precision
value: [0.92307692 0.93421053 0.93506494 0.92105263 0.93421053 0.93333333
0.93421053 0.94936709 0.93333333 0.93333333]
mean value: 0.9331193157275769
key: test_recall
value: [0.55555556 0.55555556 0.77777778 0.77777778 0.77777778 0.77777778
0.55555556 0.77777778 0.625 0.75 ]
mean value: 0.6930555555555555
key: train_recall
value: [0.91139241 0.89873418 0.91139241 0.88607595 0.89873418 0.88607595
0.89873418 0.94936709 0.875 0.875 ]
mean value: 0.8990506329113924
key: test_roc_auc
value: [0.74747475 0.76262626 0.85858586 0.87373737 0.87326389 0.87326389
0.76215278 0.81076389 0.8125 0.875 ]
mean value: 0.8249368686868687
key: train_roc_auc
value: [0.94545729 0.94083467 0.94716378 0.93279907 0.94086369 0.93453457
0.94086369 0.96788082 0.92896758 0.92896758]
mean value: 0.940833273085447
key: test_jcc
value: [0.45454545 0.5 0.63636364 0.7 0.7 0.7
0.5 0.5 0.625 0.75 ]
mean value: 0.6065909090909091
key: train_jcc
value: [0.84705882 0.8452381 0.85714286 0.82352941 0.8452381 0.83333333
0.8452381 0.90361446 0.82352941 0.82352941]
mean value: 0.8447451992845331
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.06172132 0.09037328 0.1193223 0.08857775 0.07014179 0.10274005
0.0769608 0.04097271 0.03973961 0.0395906 ]
mean value: 0.0730140209197998
key: score_time
value: [0.02024794 0.03795671 0.01832414 0.02112007 0.0176785 0.03566527
0.01544523 0.01500964 0.01500463 0.0150404 ]
mean value: 0.021149253845214842
key: test_mcc
value: [0.90950859 0.85839508 0.91144345 0.81534091 0.93844697 0.87689394
0.81706198 0.90814394 0.87844611 0.90814394]
mean value: 0.8821824907024225
key: train_mcc
value: [0.91160266 0.92150707 0.92168472 0.92845534 0.92168472 0.92523832
0.92858222 0.93526967 0.93187755 0.91856557]
mean value: 0.9244467849823496
key: test_accuracy
value: [0.95454545 0.92424242 0.95384615 0.90769231 0.96923077 0.93846154
0.90769231 0.95384615 0.93846154 0.95384615]
mean value: 0.9401864801864802
key: train_accuracy
value: [0.9556314 0.96075085 0.96081772 0.96422487 0.96081772 0.96252129
0.96422487 0.96763203 0.96592845 0.95911414]
mean value: 0.9621663342849335
key: test_fscore
value: [0.95384615 0.92957746 0.95652174 0.90909091 0.96969697 0.93939394
0.90909091 0.95384615 0.93548387 0.95384615]
mean value: 0.9410394263698099
key: train_fscore
value: [0.95622896 0.96068376 0.96095076 0.96422487 0.96095076 0.96283784
0.96458685 0.96763203 0.96610169 0.95973154]
mean value: 0.962392906733548
key: test_precision
value: [0.96875 0.86842105 0.91666667 0.90909091 0.96969697 0.93939394
0.88235294 0.93939394 0.96666667 0.93939394]
mean value: 0.929982702411108
key: train_precision
value: [0.94352159 0.96232877 0.95608108 0.96258503 0.95608108 0.95317726
0.95652174 0.96928328 0.96283784 0.94701987]
mean value: 0.9569437536476978
key: test_recall
value: [0.93939394 1. 1. 0.90909091 0.96969697 0.93939394
0.9375 0.96875 0.90625 0.96875 ]
mean value: 0.9538825757575757
key: train_recall
value: [0.96928328 0.95904437 0.96587031 0.96587031 0.96587031 0.97269625
0.97278912 0.96598639 0.96938776 0.97278912]
mean value: 0.967958719323907
key: test_roc_auc
value: [0.95454545 0.92424242 0.953125 0.90767045 0.96922348 0.93844697
0.90814394 0.95407197 0.93797348 0.95407197]
mean value: 0.9401515151515152
key: train_roc_auc
value: [0.9556314 0.96075085 0.96082631 0.96422767 0.96082631 0.9625386
0.96421026 0.96763484 0.96592255 0.9590908 ]
mean value: 0.962165958533584
key: test_jcc
value: [0.91176471 0.86842105 0.91666667 0.83333333 0.94117647 0.88571429
0.83333333 0.91176471 0.87878788 0.91176471]
mean value: 0.8892727138702371
key: train_jcc
value: [0.91612903 0.92434211 0.9248366 0.93092105 0.9248366 0.92833876
0.93159609 0.93729373 0.93442623 0.92258065]
mean value: 0.9275300850229801
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.08162332 1.38320208 1.09667015 1.27541614 1.16354775 1.22057319
0.954983 0.97117877 1.09513283 1.1121552 ]
mean value: 1.1354482412338256
key: score_time
value: [0.01517391 0.01828575 0.0150702 0.02177691 0.01645041 0.01536608
0.01509404 0.01963162 0.01525354 0.01510572]
mean value: 0.0167208194732666
key: test_mcc
value: [0.9701425 0.88531564 0.91144345 0.87689394 0.96966868 0.90814394
0.90805728 0.90814394 0.93844697 0.96966868]
mean value: 0.924592502510245
key: train_mcc
value: [0.98294088 0.98976686 0.98978431 0.98296998 0.98296998 0.95229969
0.98296978 0.98296978 0.98639408 0.98296978]
mean value: 0.9816035126690726
key: test_accuracy
value: [0.98484848 0.93939394 0.95384615 0.93846154 0.98461538 0.95384615
0.95384615 0.95384615 0.96923077 0.98461538]
mean value: 0.9616550116550117
key: train_accuracy
value: [0.99146758 0.99488055 0.99488927 0.99148211 0.99148211 0.97614991
0.99148211 0.99148211 0.99318569 0.99148211]
mean value: 0.990798355727916
key: test_fscore
value: [0.98507463 0.94285714 0.95652174 0.93939394 0.98507463 0.95384615
0.95238095 0.95384615 0.96875 0.98412698]
mean value: 0.9621872319313105
key: train_fscore
value: [0.99148211 0.99487179 0.99488927 0.99148211 0.99148211 0.97610922
0.99151104 0.99151104 0.99322034 0.99151104]
mean value: 0.9908070060602878
key: test_precision
value: [0.97058824 0.89189189 0.91666667 0.93939394 0.97058824 0.96875
0.96774194 0.93939394 0.96875 1. ]
mean value: 0.9533764843418544
key: train_precision
value: [0.98979592 0.99657534 0.99319728 0.98979592 0.98979592 0.97610922
0.98983051 0.98983051 0.98986486 0.98983051]
mean value: 0.9894625981785018
key: test_recall
value: [1. 1. 1. 0.93939394 1. 0.93939394
0.9375 0.96875 0.96875 0.96875 ]
mean value: 0.9722537878787879
key: train_recall
value: [0.99317406 0.99317406 0.99658703 0.99317406 0.99317406 0.97610922
0.99319728 0.99319728 0.99659864 0.99319728]
mean value: 0.9921582967658052
key: test_roc_auc
value: [0.98484848 0.93939394 0.953125 0.93844697 0.984375 0.95407197
0.95359848 0.95407197 0.96922348 0.984375 ]
mean value: 0.9615530303030303
key: train_roc_auc
value: [0.99146758 0.99488055 0.99489215 0.99148499 0.99148499 0.97614985
0.99147919 0.99147919 0.99317987 0.99147919]
mean value: 0.9907977525481182
key: test_jcc
value: [0.97058824 0.89189189 0.91666667 0.88571429 0.97058824 0.91176471
0.90909091 0.91176471 0.93939394 0.96875 ]
mean value: 0.9276213575110633
key: train_jcc
value: [0.98310811 0.98979592 0.98983051 0.98310811 0.98310811 0.95333333
0.98316498 0.98316498 0.98653199 0.98316498]
mean value: 0.9818311020526517
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01670432 0.01154232 0.0110631 0.01051259 0.01056409 0.01046848
0.01059175 0.01075506 0.01064587 0.0109458 ]
mean value: 0.011379337310791016
key: score_time
value: [0.0129385 0.01055431 0.00917029 0.00899935 0.00896192 0.0092535
0.00926399 0.00940728 0.00943923 0.00953221]
mean value: 0.00975205898284912
key: test_mcc
value: [0.60858062 0.70878358 0.81706198 0.60805838 0.66477003 0.60304138
0.83005736 0.75545058 0.72649867 0.60621087]
mean value: 0.692851345439512
key: train_mcc
value: [0.72013652 0.71010029 0.72068589 0.73786883 0.73157215 0.74496978
0.72748805 0.72748805 0.7005195 0.7071214 ]
mean value: 0.7227950458299315
key: test_accuracy
value: [0.8030303 0.84848485 0.90769231 0.8 0.83076923 0.8
0.90769231 0.87692308 0.86153846 0.8 ]
mean value: 0.8436130536130536
key: train_accuracy
value: [0.86006826 0.85494881 0.86030664 0.86882453 0.86541738 0.87223169
0.8637138 0.8637138 0.84497445 0.85349233]
mean value: 0.8607691681541476
key: test_fscore
value: [0.8115942 0.86111111 0.90625 0.78688525 0.82539683 0.79365079
0.91428571 0.87878788 0.86567164 0.77966102]
mean value: 0.8423294430772711
key: train_fscore
value: [0.86006826 0.85666105 0.86101695 0.87015177 0.86811352 0.86956522
0.86486486 0.86486486 0.85758998 0.85521886]
mean value: 0.8628115333955078
key: test_precision
value: [0.77777778 0.79487179 0.93548387 0.85714286 0.86666667 0.83333333
0.84210526 0.85294118 0.82857143 0.85185185]
mean value: 0.8440746020811936
key: train_precision
value: [0.86006826 0.84666667 0.85521886 0.86 0.8496732 0.88652482
0.8590604 0.8590604 0.7942029 0.84666667]
mean value: 0.8517142177167121
key: test_recall
value: [0.84848485 0.93939394 0.87878788 0.72727273 0.78787879 0.75757576
1. 0.90625 0.90625 0.71875 ]
mean value: 0.8470643939393939
key: train_recall
value: [0.86006826 0.8668942 0.8668942 0.88054608 0.88737201 0.85324232
0.8707483 0.8707483 0.93197279 0.86394558]
mean value: 0.8752432030832811
key: test_roc_auc
value: [0.8030303 0.84848485 0.90814394 0.80113636 0.83143939 0.80066288
0.90909091 0.87736742 0.86221591 0.79876894]
mean value: 0.8440340909090909
key: train_roc_auc
value: [0.86006826 0.85494881 0.86031785 0.86884447 0.86545471 0.87219939
0.86370179 0.86370179 0.84482599 0.8534745 ]
mean value: 0.8607537554270855
key: test_jcc
value: [0.68292683 0.75609756 0.82857143 0.64864865 0.7027027 0.65789474
0.84210526 0.78378378 0.76315789 0.63888889]
mean value: 0.7304777737576197
key: train_jcc
value: [0.75449102 0.74926254 0.75595238 0.77014925 0.76696165 0.76923077
0.76190476 0.76190476 0.75068493 0.74705882]
mean value: 0.758760088951491
MCC on Blind test: 0.46
Accuracy on Blind test: 0.82
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01092649 0.01078749 0.01086593 0.01097727 0.01108479 0.01093435
0.01094604 0.01094174 0.01101041 0.01098323]
mean value: 0.010945773124694825
key: score_time
value: [0.0093627 0.00943446 0.00938559 0.00923657 0.00925374 0.0092957
0.00910807 0.00916314 0.00904298 0.00911641]
mean value: 0.009239935874938964
key: test_mcc
value: [0.52388352 0.62994079 0.66477003 0.63068182 0.72348485 0.53838887
0.54131274 0.69223485 0.60191459 0.63068182]
mean value: 0.6177293864483187
key: train_mcc
value: [0.6694139 0.6387612 0.65435396 0.67018758 0.64787328 0.66085884
0.67328414 0.66469027 0.65721726 0.65981157]
mean value: 0.6596452005232071
key: test_accuracy
value: [0.75757576 0.8030303 0.83076923 0.81538462 0.86153846 0.76923077
0.76923077 0.84615385 0.8 0.81538462]
mean value: 0.8068298368298369
key: train_accuracy
value: [0.83447099 0.81911263 0.82623509 0.83475298 0.82282794 0.82964225
0.83645656 0.83134583 0.82793867 0.82964225]
mean value: 0.8292425185038752
key: test_fscore
value: [0.77777778 0.82666667 0.82539683 0.81818182 0.86153846 0.7761194
0.7761194 0.84375 0.78688525 0.8125 ]
mean value: 0.8104935601433338
key: train_fscore
value: [0.83752094 0.82274247 0.83223684 0.83806344 0.8295082 0.8349835
0.83946488 0.83797054 0.83360791 0.83333333]
mean value: 0.8339432053299032
key: test_precision
value: [0.71794872 0.73809524 0.86666667 0.81818182 0.875 0.76470588
0.74285714 0.84375 0.82758621 0.8125 ]
mean value: 0.8007291672999077
key: train_precision
value: [0.82236842 0.80655738 0.8031746 0.82026144 0.79810726 0.80830671
0.82565789 0.80757098 0.80830671 0.81699346]
mean value: 0.8117304849942879
key: test_recall
value: [0.84848485 0.93939394 0.78787879 0.81818182 0.84848485 0.78787879
0.8125 0.84375 0.75 0.8125 ]
mean value: 0.8249053030303031
key: train_recall
value: [0.85324232 0.83959044 0.86348123 0.85665529 0.86348123 0.86348123
0.8537415 0.8707483 0.86054422 0.85034014]
mean value: 0.8575305890274199
key: test_roc_auc
value: [0.75757576 0.8030303 0.83143939 0.81534091 0.86174242 0.76893939
0.76988636 0.84611742 0.79924242 0.81534091]
mean value: 0.8068655303030303
key: train_roc_auc
value: [0.83447099 0.81911263 0.82629844 0.83479023 0.82289708 0.8296998
0.83642706 0.83127859 0.82788303 0.82960693]
mean value: 0.8292464767476957
key: test_jcc
value: [0.63636364 0.70454545 0.7027027 0.69230769 0.75675676 0.63414634
0.63414634 0.72972973 0.64864865 0.68421053]
mean value: 0.682355783029724
key: train_jcc
value: [0.7204611 0.69886364 0.71267606 0.72126437 0.70868347 0.71671388
0.72334294 0.72112676 0.71468927 0.71428571]
mean value: 0.7152107189894893
MCC on Blind test: 0.46
Accuracy on Blind test: 0.82
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01015234 0.00999308 0.01089859 0.01132965 0.01142859 0.01188827
0.01054168 0.01178813 0.01112437 0.01109695]
mean value: 0.011024165153503417
key: score_time
value: [0.01811504 0.01828551 0.01806641 0.01871514 0.01840639 0.01976562
0.01915121 0.01830125 0.01836991 0.01812267]
mean value: 0.01852991580963135
key: test_mcc
value: [0.78824078 0.72760688 0.66193182 0.69223485 0.78763191 0.60037879
0.64271802 0.94017476 0.73234704 0.78822732]
mean value: 0.7361492160836447
key: train_mcc
value: [0.79942206 0.82942883 0.8364774 0.8336452 0.8127293 0.85353905
0.81260956 0.80254528 0.81261173 0.84328817]
mean value: 0.8236296598209656
key: test_accuracy
value: [0.89393939 0.86363636 0.83076923 0.84615385 0.89230769 0.8
0.81538462 0.96923077 0.86153846 0.89230769]
mean value: 0.8665268065268066
key: train_accuracy
value: [0.89931741 0.91467577 0.91822828 0.9165247 0.90630324 0.92674617
0.90630324 0.9011925 0.90630324 0.92163543]
mean value: 0.911722997133571
key: test_fscore
value: [0.89230769 0.86567164 0.83076923 0.84848485 0.89855072 0.8
0.82857143 0.96774194 0.86956522 0.89552239]
mean value: 0.8697185107496803
key: train_fscore
value: [0.9015025 0.91525424 0.91836735 0.91792295 0.90693739 0.92699491
0.90662139 0.9023569 0.90630324 0.9220339 ]
mean value: 0.912429476699208
key: test_precision
value: [0.90625 0.85294118 0.84375 0.84848485 0.86111111 0.8125
0.76315789 1. 0.81081081 0.85714286]
mean value: 0.8556148698757058
key: train_precision
value: [0.88235294 0.90909091 0.91525424 0.90131579 0.89932886 0.9222973
0.90508475 0.89333333 0.90784983 0.91891892]
mean value: 0.90548268607534
key: test_recall
value: [0.87878788 0.87878788 0.81818182 0.84848485 0.93939394 0.78787879
0.90625 0.9375 0.9375 0.9375 ]
mean value: 0.8870265151515152
key: train_recall
value: [0.92150171 0.92150171 0.92150171 0.93515358 0.91467577 0.93174061
0.90816327 0.91156463 0.9047619 0.92517007]
mean value: 0.919573494926981
key: test_roc_auc
value: [0.89393939 0.86363636 0.83096591 0.84611742 0.89157197 0.80018939
0.81676136 0.96875 0.86268939 0.89299242]
mean value: 0.8667613636363637
key: train_roc_auc
value: [0.89931741 0.91467577 0.91823385 0.91655638 0.90631748 0.92675466
0.90630006 0.9011748 0.90630587 0.9216294 ]
mean value: 0.9117265677602099
key: test_jcc
value: [0.80555556 0.76315789 0.71052632 0.73684211 0.81578947 0.66666667
0.70731707 0.9375 0.76923077 0.81081081]
mean value: 0.7723396664908219
key: train_jcc
value: [0.82066869 0.84375 0.8490566 0.84829721 0.82972136 0.86392405
0.82919255 0.82208589 0.82866044 0.85534591]
mean value: 0.8390702707508169
MCC on Blind test: 0.23
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02719641 0.02511859 0.02477622 0.02495646 0.02513075 0.02472091
0.02505088 0.02563429 0.02554011 0.02485394]
mean value: 0.02529785633087158
key: score_time
value: [0.01269245 0.01236415 0.01242304 0.01249981 0.01250362 0.0125668
0.01256847 0.01278996 0.01274538 0.01266146]
mean value: 0.012581515312194824
key: test_mcc
value: [0.81818182 0.79708114 0.91144345 0.75545058 0.84953768 0.78503788
0.61558566 0.84995597 0.87689394 0.84659091]
mean value: 0.8105759022960609
key: train_mcc
value: [0.86446862 0.86730345 0.88088051 0.87752971 0.87455731 0.87153016
0.88781155 0.8677218 0.86420732 0.85786412]
mean value: 0.8713874554555745
key: test_accuracy
value: [0.90909091 0.89393939 0.95384615 0.87692308 0.92307692 0.89230769
0.8 0.92307692 0.93846154 0.92307692]
mean value: 0.9033799533799534
key: train_accuracy
value: [0.93174061 0.9334471 0.94037479 0.93867121 0.93696763 0.93526405
0.94378194 0.93356048 0.9318569 0.92844974]
mean value: 0.935411445947753
key: test_fscore
value: [0.90909091 0.90140845 0.95652174 0.875 0.92753623 0.89230769
0.81690141 0.92537313 0.9375 0.92307692]
mean value: 0.9064716488973305
key: train_fscore
value: [0.93333333 0.93445378 0.94077834 0.93918919 0.93802345 0.93666667
0.94453782 0.93489149 0.93311037 0.93023256]
mean value: 0.9365216990049874
key: test_precision
value: [0.90909091 0.84210526 0.91666667 0.90322581 0.88888889 0.90625
0.74358974 0.88571429 0.9375 0.90909091]
mean value: 0.8842122472650911
key: train_precision
value: [0.91205212 0.9205298 0.93288591 0.92976589 0.92105263 0.91530945
0.93355482 0.91803279 0.91776316 0.90909091]
mean value: 0.9210037459895899
key: test_recall
value: [0.90909091 0.96969697 1. 0.84848485 0.96969697 0.87878788
0.90625 0.96875 0.9375 0.9375 ]
mean value: 0.9325757575757576
key: train_recall
value: [0.9556314 0.94880546 0.94880546 0.94880546 0.9556314 0.95904437
0.95578231 0.95238095 0.94897959 0.95238095]
mean value: 0.9526247359011863
key: test_roc_auc
value: [0.90909091 0.89393939 0.953125 0.87736742 0.92234848 0.89251894
0.80160985 0.92376894 0.93844697 0.92329545]
mean value: 0.9035511363636364
key: train_roc_auc
value: [0.93174061 0.9334471 0.94038912 0.93868844 0.93699937 0.9353045
0.94376146 0.93352836 0.93182768 0.92840891]
mean value: 0.9354095563139931
key: test_jcc
value: [0.83333333 0.82051282 0.91666667 0.77777778 0.86486486 0.80555556
0.69047619 0.86111111 0.88235294 0.85714286]
mean value: 0.8309794118617648
key: train_jcc
value: [0.875 0.87697161 0.88817891 0.88535032 0.88328076 0.88087774
0.89490446 0.87774295 0.87460815 0.86956522]
mean value: 0.8806480114255378
MCC on Blind test: 0.75
Accuracy on Blind test: 0.91
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.88436723 2.62071109 2.547189 2.83125925 5.8814919 4.36691046
3.61290646 4.35738778 4.10116959 4.1741395 ]
mean value: 3.637753224372864
key: score_time
value: [0.01280355 0.01515627 0.02444124 0.03257608 0.0188086 0.01465392
0.01385593 0.01601386 0.0129416 0.01285648]
mean value: 0.017410755157470703
key: test_mcc
value: [0.88040627 0.88531564 0.87844611 0.84995597 0.90805728 0.87689394
0.84659091 0.87867338 0.81534091 0.96966868]
mean value: 0.8789349091200562
key: train_mcc
value: [0.98976686 0.99659284 0.99318567 0.98637134 0.99318567 0.99318567
0.98978419 0.99318567 0.9965986 0.98978419]
mean value: 0.9921640695092957
key: test_accuracy
value: [0.93939394 0.93939394 0.93846154 0.92307692 0.95384615 0.93846154
0.92307692 0.93846154 0.90769231 0.98461538]
mean value: 0.9386480186480187
key: train_accuracy
value: [0.99488055 0.99829352 0.99659284 0.99318569 0.99659284 0.99659284
0.99488927 0.99659284 0.99829642 0.99488927]
mean value: 0.9960806088690687
key: test_fscore
value: [0.9375 0.94285714 0.94117647 0.92063492 0.95522388 0.93939394
0.92307692 0.93939394 0.90625 0.98412698]
mean value: 0.93896342006691
key: train_fscore
value: [0.99488927 0.99829642 0.99658703 0.99317406 0.99658703 0.99658703
0.99490662 0.99659864 0.99830221 0.99490662]
mean value: 0.9960834932903403
key: test_precision
value: [0.96774194 0.89189189 0.91428571 0.96666667 0.94117647 0.93939394
0.90909091 0.91176471 0.90625 1. ]
mean value: 0.9348262233283581
key: train_precision
value: [0.99319728 0.99659864 0.99658703 0.99317406 0.99658703 0.99658703
0.99322034 0.99659864 0.99661017 0.99322034]
mean value: 0.9952380558864374
key: test_recall
value: [0.90909091 1. 0.96969697 0.87878788 0.96969697 0.93939394
0.9375 0.96875 0.90625 0.96875 ]
mean value: 0.9447916666666667
key: train_recall
value: [0.99658703 1. 0.99658703 0.99317406 0.99658703 0.99658703
0.99659864 0.99659864 1. 0.99659864]
mean value: 0.9969318102667688
key: test_roc_auc
value: [0.93939394 0.93939394 0.93797348 0.92376894 0.95359848 0.93844697
0.92329545 0.93892045 0.90767045 0.984375 ]
mean value: 0.9386837121212122
key: train_roc_auc
value: [0.99488055 0.99829352 0.99659284 0.99318567 0.99659284 0.99659284
0.99488635 0.99659284 0.99829352 0.99488635]
mean value: 0.9960797288198555
key: test_jcc
value: [0.88235294 0.89189189 0.88888889 0.85294118 0.91428571 0.88571429
0.85714286 0.88571429 0.82857143 0.96875 ]
mean value: 0.8856253469856411
key: train_jcc
value: [0.98983051 0.99659864 0.99319728 0.98644068 0.99319728 0.99319728
0.98986486 0.99322034 0.99661017 0.98986486]
mean value: 0.992202190083546
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0583663 0.03681612 0.04428267 0.03703952 0.04416752 0.04544234
0.04772806 0.04291558 0.04592967 0.04700685]
mean value: 0.04496946334838867
key: score_time
value: [0.01126266 0.0109055 0.01073933 0.01008487 0.01071358 0.01080036
0.01077509 0.01093245 0.01079464 0.01084924]
mean value: 0.010785770416259766
key: test_mcc
value: [1. 0.88531564 0.90805728 0.81706198 0.96966868 0.94017476
0.93844697 0.87689394 0.94017476 0.94028478]
mean value: 0.9216078784457992
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.95384615 0.90769231 0.98461538 0.96923077
0.96923077 0.93846154 0.96923077 0.96923077]
mean value: 0.9600932400932402
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.95522388 0.90625 0.98507463 0.97058824
0.96875 0.9375 0.96774194 0.96969697]
mean value: 0.9603682790794787
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.94117647 0.93548387 0.97058824 0.94285714
0.96875 0.9375 1. 0.94117647]
mean value: 0.9529424082187364
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.96969697 0.87878788 1. 1.
0.96875 0.9375 0.9375 1. ]
mean value: 0.9692234848484849
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.95359848 0.90814394 0.984375 0.96875
0.96922348 0.93844697 0.96875 0.96969697]
mean value: 0.9600378787878788
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.91428571 0.82857143 0.97058824 0.94285714
0.93939394 0.88235294 0.9375 0.94117647]
mean value: 0.9248617764058941
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.16167355 0.15977597 0.16011739 0.16007113 0.16181588 0.15370274
0.16055512 0.16027236 0.16169715 0.16143322]
mean value: 0.1601114511489868
key: score_time
value: [0.02137733 0.02141595 0.02142334 0.02139664 0.02146602 0.02143741
0.0216198 0.0215838 0.02148175 0.02147436]
mean value: 0.02146763801574707
key: test_mcc
value: [0.88040627 0.85201287 0.87844611 0.82191818 0.84953768 0.79449138
0.84659091 0.84953768 0.84644588 0.93844697]
mean value: 0.8557833923720416
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93939394 0.92424242 0.93846154 0.90769231 0.92307692 0.89230769
0.92307692 0.92307692 0.92307692 0.96923077]
mean value: 0.9263636363636364
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9375 0.92753623 0.94117647 0.90322581 0.92753623 0.8852459
0.92307692 0.91803279 0.92063492 0.96875 ]
mean value: 0.9252715273044397
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96774194 0.88888889 0.91428571 0.96551724 0.88888889 0.96428571
0.90909091 0.96551724 0.93548387 0.96875 ]
mean value: 0.9368450404650349
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.96969697 0.96969697 0.84848485 0.96969697 0.81818182
0.9375 0.875 0.90625 0.96875 ]
mean value: 0.9172348484848485
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93939394 0.92424242 0.93797348 0.90861742 0.92234848 0.89346591
0.92329545 0.92234848 0.92282197 0.96922348]
mean value: 0.9263731060606061
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88235294 0.86486486 0.88888889 0.82352941 0.86486486 0.79411765
0.85714286 0.84848485 0.85294118 0.93939394]
mean value: 0.8616581440110852
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01344705 0.01317191 0.01322961 0.01330113 0.01336503 0.01348329
0.01328564 0.01326752 0.01366425 0.01333809]
mean value: 0.013355350494384766
key: score_time
value: [0.01062298 0.01056838 0.01057148 0.01060891 0.01068354 0.01066709
0.01063323 0.01060557 0.01065063 0.01063728]
mean value: 0.010624909400939941
key: test_mcc
value: [0.63753558 0.63753558 0.63153153 0.60037879 0.54981488 0.60191459
0.73110376 0.51053958 0.63482825 0.57061637]
mean value: 0.6105798900474947
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.81818182 0.81818182 0.81538462 0.8 0.76923077 0.8
0.86153846 0.75384615 0.81538462 0.78461538]
mean value: 0.8036363636363637
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.82352941 0.82352941 0.8 0.79452055 0.8115942
0.84745763 0.73333333 0.8 0.78787879]
mean value: 0.8045372734468639
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.8 0.8 0.8125 0.725 0.77777778
0.92592593 0.78571429 0.85714286 0.76470588]
mean value: 0.8048766728913788
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84848485 0.84848485 0.84848485 0.78787879 0.87878788 0.84848485
0.78125 0.6875 0.75 0.8125 ]
mean value: 0.8091856060606061
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.81818182 0.81818182 0.81486742 0.80018939 0.76751894 0.79924242
0.86032197 0.75284091 0.81439394 0.78503788]
mean value: 0.8030776515151515
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.7 0.7 0.66666667 0.65909091 0.68292683
0.73529412 0.57894737 0.66666667 0.65 ]
mean value: 0.6739592557760646
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.85
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.59411907 2.52079487 2.54728317 2.55361605 3.08455682 2.76037025
2.66518021 3.56287289 2.2605269 2.20357037]
mean value: 2.6752890586853026
key: score_time
value: [0.11118126 0.11067724 0.11062789 0.11689639 0.1420486 0.10095429
0.22337675 0.1026926 0.09578085 0.09630704]
mean value: 0.12105429172515869
key: test_mcc
value: [0.9701425 0.88531564 0.96966868 0.78822732 0.96966868 0.94028478
0.96966868 0.87689394 0.94017476 0.96969697]
mean value: 0.9279741955034888
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98484848 0.93939394 0.98461538 0.89230769 0.98461538 0.96923077
0.98461538 0.93846154 0.96923077 0.98461538]
mean value: 0.9631934731934733
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98461538 0.94285714 0.98507463 0.88888889 0.98507463 0.96875
0.98412698 0.9375 0.96774194 0.98461538]
mean value: 0.9629244974318999
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.97058824 0.93333333 0.97058824 1.
1. 0.9375 1. 0.96969697]
mean value: 0.967359866551043
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96969697 1. 1. 0.84848485 1. 0.93939394
0.96875 0.9375 0.9375 1. ]
mean value: 0.9601325757575758
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.93939394 0.984375 0.89299242 0.984375 0.96969697
0.984375 0.93844697 0.96875 0.98484848]
mean value: 0.9632102272727273
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96969697 0.89189189 0.97058824 0.8 0.97058824 0.93939394
0.96875 0.88235294 0.9375 0.96969697]
mean value: 0.9300459182444477
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [2.84621882 1.85790062 1.10967755 1.08481646 1.06378484 1.09557199
1.08685064 1.09259677 1.09036088 1.1133182 ]
mean value: 1.3441096782684325
key: score_time
value: [0.33816075 0.17422271 0.1956923 0.26042914 0.24330926 0.25248599
0.19835854 0.27451754 0.25211048 0.27057958]
mean value: 0.24598662853240966
key: test_mcc
value: [0.9701425 0.88531564 0.96966868 0.81706198 1. 0.94028478
0.96966868 0.94028478 0.91144345 0.90805728]
mean value: 0.9311927780824278
key: train_mcc
value: [0.97616038 0.97952218 0.96938669 0.97620126 0.97283366 0.97283366
0.96601728 0.96934096 0.96947532 0.96938563]
mean value: 0.9721157032553221
key: test_accuracy
value: [0.98484848 0.93939394 0.98461538 0.90769231 1. 0.96923077
0.98461538 0.96923077 0.95384615 0.95384615]
mean value: 0.9647319347319347
key: train_accuracy
value: [0.98805461 0.98976109 0.9846678 0.98807496 0.98637138 0.98637138
0.98296422 0.9846678 0.9846678 0.9846678 ]
mean value: 0.9860268851277102
key: test_fscore
value: [0.98461538 0.94285714 0.98507463 0.90625 1. 0.96875
0.98412698 0.96969697 0.95081967 0.95238095]
mean value: 0.9644571732674253
key: train_fscore
value: [0.98811545 0.98976109 0.98471986 0.98811545 0.98644068 0.98644068
0.98310811 0.98471986 0.98482293 0.98477157]
mean value: 0.986101569221062
key: test_precision
value: [1. 0.89189189 0.97058824 0.93548387 1. 1.
1. 0.94117647 1. 0.96774194]
mean value: 0.9706882404225857
key: train_precision
value: [0.98310811 0.98976109 0.97972973 0.98310811 0.97979798 0.97979798
0.97651007 0.98305085 0.97658863 0.97979798]
mean value: 0.9811250520824318
key: test_recall
value: [0.96969697 1. 1. 0.87878788 1. 0.93939394
0.96875 1. 0.90625 0.9375 ]
mean value: 0.9600378787878788
key: train_recall
value: [0.99317406 0.98976109 0.98976109 0.99317406 0.99317406 0.99317406
0.98979592 0.98639456 0.99319728 0.98979592]
mean value: 0.9911402103503517
key: test_roc_auc
value: [0.98484848 0.93939394 0.984375 0.90814394 1. 0.96969697
0.984375 0.96969697 0.953125 0.95359848]
mean value: 0.9647253787878788
key: train_roc_auc
value: [0.98805461 0.98976109 0.98467646 0.98808363 0.98638295 0.98638295
0.98295257 0.98466486 0.98465325 0.98465905]
mean value: 0.9860271412319194
key: test_jcc
value: [0.96969697 0.89189189 0.97058824 0.82857143 1. 0.93939394
0.96875 0.94117647 0.90625 0.90909091]
mean value: 0.9325409844527491
key: train_jcc
value: [0.97651007 0.97972973 0.96989967 0.97651007 0.97324415 0.97324415
0.96677741 0.96989967 0.97009967 0.97 ]
mean value: 0.9725914565787938
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01303124 0.01310182 0.01105499 0.01080608 0.01093006 0.01085377
0.01200366 0.0113945 0.01290417 0.01115561]
mean value: 0.011723589897155762
key: score_time
value: [0.01087046 0.01089787 0.00959611 0.00923371 0.00975561 0.00923085
0.01004624 0.0104599 0.00927067 0.00921893]
mean value: 0.009858036041259765
key: test_mcc
value: [0.52388352 0.62994079 0.66477003 0.63068182 0.72348485 0.53838887
0.54131274 0.69223485 0.60191459 0.63068182]
mean value: 0.6177293864483187
key: train_mcc
value: [0.6694139 0.6387612 0.65435396 0.67018758 0.64787328 0.66085884
0.67328414 0.66469027 0.65721726 0.65981157]
mean value: 0.6596452005232071
key: test_accuracy
value: [0.75757576 0.8030303 0.83076923 0.81538462 0.86153846 0.76923077
0.76923077 0.84615385 0.8 0.81538462]
mean value: 0.8068298368298369
key: train_accuracy
value: [0.83447099 0.81911263 0.82623509 0.83475298 0.82282794 0.82964225
0.83645656 0.83134583 0.82793867 0.82964225]
mean value: 0.8292425185038752
key: test_fscore
value: [0.77777778 0.82666667 0.82539683 0.81818182 0.86153846 0.7761194
0.7761194 0.84375 0.78688525 0.8125 ]
mean value: 0.8104935601433338
key: train_fscore
value: [0.83752094 0.82274247 0.83223684 0.83806344 0.8295082 0.8349835
0.83946488 0.83797054 0.83360791 0.83333333]
mean value: 0.8339432053299032
key: test_precision
value: [0.71794872 0.73809524 0.86666667 0.81818182 0.875 0.76470588
0.74285714 0.84375 0.82758621 0.8125 ]
mean value: 0.8007291672999077
key: train_precision
value: [0.82236842 0.80655738 0.8031746 0.82026144 0.79810726 0.80830671
0.82565789 0.80757098 0.80830671 0.81699346]
mean value: 0.8117304849942879
key: test_recall
value: [0.84848485 0.93939394 0.78787879 0.81818182 0.84848485 0.78787879
0.8125 0.84375 0.75 0.8125 ]
mean value: 0.8249053030303031
key: train_recall
value: [0.85324232 0.83959044 0.86348123 0.85665529 0.86348123 0.86348123
0.8537415 0.8707483 0.86054422 0.85034014]
mean value: 0.8575305890274199
key: test_roc_auc
value: [0.75757576 0.8030303 0.83143939 0.81534091 0.86174242 0.76893939
0.76988636 0.84611742 0.79924242 0.81534091]
mean value: 0.8068655303030303
key: train_roc_auc
value: [0.83447099 0.81911263 0.82629844 0.83479023 0.82289708 0.8296998
0.83642706 0.83127859 0.82788303 0.82960693]
mean value: 0.8292464767476957
key: test_jcc
value: [0.63636364 0.70454545 0.7027027 0.69230769 0.75675676 0.63414634
0.63414634 0.72972973 0.64864865 0.68421053]
mean value: 0.682355783029724
key: train_jcc
value: [0.7204611 0.69886364 0.71267606 0.72126437 0.70868347 0.71671388
0.72334294 0.72112676 0.71468927 0.71428571]
mean value: 0.7152107189894893
MCC on Blind test: 0.46
Accuracy on Blind test: 0.82
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10618877 0.09051251 0.19730282 0.09614396 0.11167979 0.09417415
0.09410381 0.12005615 0.11551499 0.11230063]
mean value: 0.11379776000976563
key: score_time
value: [0.01352429 0.01269913 0.01121426 0.01182103 0.01217675 0.01148033
0.01145482 0.01174784 0.01183248 0.01157618]
mean value: 0.011952710151672364
key: test_mcc
value: [1. 0.88531564 0.93844697 0.90805728 0.93844697 1.
0.96966868 0.94028478 0.96966868 1. ]
mean value: 0.9549889006803033
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.96923077 0.95384615 0.96923077 1.
0.98461538 0.96923077 0.98461538 1. ]
mean value: 0.9770163170163171
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.96969697 0.95522388 0.96969697 1.
0.98412698 0.96969697 0.98412698 1. ]
mean value: 0.9775425900799035
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.96969697 0.94117647 0.96969697 1.
1. 0.94117647 1. 1. ]
mean value: 0.9713638772462302
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.96969697 0.96969697 0.96969697 1.
0.96875 1. 0.96875 1. ]
mean value: 0.9846590909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.96922348 0.95359848 0.96922348 1.
0.984375 0.96969697 0.984375 1. ]
mean value: 0.9769886363636364
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.94117647 0.91428571 0.94117647 1.
0.96875 0.94117647 0.96875 1. ]
mean value: 0.9567207017942312
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0472343 0.06465888 0.0483613 0.08760691 0.0638423 0.06322742
0.0496552 0.10060954 0.09633088 0.09051538]
mean value: 0.07120420932769775
key: score_time
value: [0.01958227 0.01240706 0.01249146 0.01987028 0.01252627 0.01251197
0.0195992 0.01977491 0.01979899 0.01302886]
mean value: 0.01615912914276123
key: test_mcc
value: [0.94112395 0.85839508 0.90805728 0.94028478 0.93844697 0.94017476
0.91144345 0.84995597 0.90805728 0.96966868]
mean value: 0.9165608191795214
key: train_mcc
value: [0.96250779 0.97952218 0.98301582 0.96601886 0.96595117 0.96934132
0.95920216 0.96595038 0.96601728 0.95913582]
mean value: 0.9676662791940402
key: test_accuracy
value: [0.96969697 0.92424242 0.95384615 0.96923077 0.96923077 0.96923077
0.95384615 0.92307692 0.95384615 0.98461538]
mean value: 0.9570862470862471
key: train_accuracy
value: [0.98122867 0.98976109 0.99148211 0.98296422 0.98296422 0.9846678
0.97955707 0.98296422 0.98296422 0.97955707]
mean value: 0.9838110715095557
key: test_fscore
value: [0.97058824 0.92957746 0.95522388 0.96875 0.96969697 0.97058824
0.95081967 0.92537313 0.95238095 0.98412698]
mean value: 0.9577125528638395
key: train_fscore
value: [0.98132428 0.98976109 0.99151104 0.98305085 0.9829932 0.9846678
0.97972973 0.98305085 0.98310811 0.97966102]
mean value: 0.9838857955608016
key: test_precision
value: [0.94285714 0.86842105 0.94117647 1. 0.96969697 0.94285714
1. 0.88571429 0.96774194 1. ]
mean value: 0.9518464999829226
key: train_precision
value: [0.97635135 0.98976109 0.98648649 0.97643098 0.97966102 0.9829932
0.97315436 0.97972973 0.97651007 0.97635135]
mean value: 0.9797429631258331
key: test_recall
value: [1. 1. 0.96969697 0.93939394 0.96969697 1.
0.90625 0.96875 0.9375 0.96875 ]
mean value: 0.9660037878787879
key: train_recall
value: [0.98634812 0.98976109 0.99658703 0.98976109 0.98634812 0.98634812
0.98639456 0.98639456 0.98979592 0.9829932 ]
mean value: 0.9880731814910264
key: test_roc_auc
value: [0.96969697 0.92424242 0.95359848 0.96969697 0.96922348 0.96875
0.953125 0.92376894 0.95359848 0.984375 ]
mean value: 0.9570075757575758
key: train_roc_auc
value: [0.98122867 0.98976109 0.99149079 0.98297578 0.98296998 0.98467066
0.9795454 0.98295837 0.98295257 0.97955121]
mean value: 0.9838104525086485
key: test_jcc
value: [0.94285714 0.86842105 0.91428571 0.93939394 0.94117647 0.94285714
0.90625 0.86111111 0.90909091 0.96875 ]
mean value: 0.9194193482815773
key: train_jcc
value: [0.96333333 0.97972973 0.98316498 0.96666667 0.96655518 0.96979866
0.9602649 0.96666667 0.96677741 0.96013289]
mean value: 0.9683090420891562
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01097393 0.01111913 0.01158977 0.01039386 0.01048326 0.01046848
0.01031518 0.01036549 0.01155186 0.0104816 ]
mean value: 0.01077425479888916
key: score_time
value: [0.00955105 0.00978112 0.00976467 0.00874472 0.00880885 0.00883722
0.00887394 0.00886011 0.00992799 0.00892401]
mean value: 0.009207367897033691
key: test_mcc
value: [0.58003439 0.64109064 0.78763191 0.63222777 0.69326017 0.57268392
0.66477003 0.76001241 0.69326017 0.69223485]
mean value: 0.6717206249618696
key: train_mcc
value: [0.69673692 0.68326672 0.68347159 0.69045282 0.65968626 0.68734858
0.67677421 0.69041333 0.67717627 0.69082395]
mean value: 0.6836150647547126
key: test_accuracy
value: [0.78787879 0.81818182 0.89230769 0.81538462 0.84615385 0.78461538
0.83076923 0.87692308 0.84615385 0.84615385]
mean value: 0.8344522144522144
key: train_accuracy
value: [0.84812287 0.84129693 0.84156729 0.84497445 0.82964225 0.84327087
0.83816014 0.84497445 0.83816014 0.84497445]
mean value: 0.8415143815664773
key: test_fscore
value: [0.8 0.82857143 0.89855072 0.8125 0.85294118 0.8
0.8358209 0.88235294 0.83870968 0.84375 ]
mean value: 0.8393196843797912
key: train_fscore
value: [0.85092127 0.84474124 0.84369748 0.84757119 0.83221477 0.84666667
0.84140234 0.84808013 0.84245439 0.84908789]
mean value: 0.8446837367804668
key: test_precision
value: [0.75675676 0.78378378 0.86111111 0.83870968 0.82857143 0.75675676
0.8 0.83333333 0.86666667 0.84375 ]
mean value: 0.8169439514399193
key: train_precision
value: [0.83552632 0.82679739 0.83112583 0.83223684 0.81848185 0.82736156
0.82622951 0.83278689 0.82200647 0.82847896]
mean value: 0.8281031613368782
key: test_recall
value: [0.84848485 0.87878788 0.93939394 0.78787879 0.87878788 0.84848485
0.875 0.9375 0.8125 0.84375 ]
mean value: 0.8650568181818182
key: train_recall
value: [0.8668942 0.86348123 0.85665529 0.86348123 0.84641638 0.8668942
0.85714286 0.86394558 0.86394558 0.8707483 ]
mean value: 0.861960483852244
key: test_roc_auc
value: [0.78787879 0.81818182 0.89157197 0.81581439 0.84564394 0.78361742
0.83143939 0.87784091 0.84564394 0.84611742]
mean value: 0.834375
key: train_roc_auc
value: [0.84812287 0.84129693 0.84159295 0.84500592 0.82967078 0.84331104
0.83812774 0.84494207 0.83811613 0.84493046]
mean value: 0.8415116900002322
key: test_jcc
value: [0.66666667 0.70731707 0.81578947 0.68421053 0.74358974 0.66666667
0.71794872 0.78947368 0.72222222 0.72972973]
mean value: 0.7243614504205005
key: train_jcc
value: [0.74052478 0.73121387 0.72965116 0.73546512 0.71264368 0.73410405
0.72622478 0.73623188 0.7277937 0.73775216]
mean value: 0.7311605183224938
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02084875 0.02504754 0.02607274 0.03220367 0.03078127 0.02826762
0.03048325 0.02629876 0.02308607 0.02272677]
mean value: 0.026581645011901855
key: score_time
value: [0.01154184 0.01212072 0.01222634 0.0122149 0.01249433 0.01250577
0.01228857 0.01224852 0.01216984 0.01211524]
mean value: 0.012192606925964355
key: test_mcc
value: [0.94112395 0.88531564 0.88340557 0.84659091 0.94017476 0.90814394
0.87867338 0.71318944 0.90814394 0.96966868]
mean value: 0.8874430205026376
key: train_mcc
value: [0.9527735 0.98298668 0.91765351 0.96592835 0.97283366 0.96595038
0.94982099 0.87645168 0.92677947 0.94212842]
mean value: 0.9453306644273733
key: test_accuracy
value: [0.96969697 0.93939394 0.93846154 0.92307692 0.96923077 0.95384615
0.93846154 0.84615385 0.95384615 0.98461538]
mean value: 0.9416783216783217
key: train_accuracy
value: [0.97610922 0.99146758 0.95741056 0.98296422 0.98637138 0.98296422
0.97444634 0.93526405 0.96252129 0.97103918]
mean value: 0.9720558052456233
key: test_fscore
value: [0.97058824 0.94285714 0.94285714 0.92307692 0.97058824 0.95384615
0.93939394 0.82142857 0.95384615 0.98412698]
mean value: 0.9402609482021247
key: train_fscore
value: [0.97651007 0.99142367 0.9589491 0.98293515 0.98644068 0.98287671
0.9750416 0.93140794 0.96369637 0.97094017]
mean value: 0.9720221458694838
key: test_precision
value: [0.94285714 0.89189189 0.89189189 0.9375 0.94285714 0.96875
0.91176471 0.95833333 0.93939394 1. ]
mean value: 0.9385240048107695
key: train_precision
value: [0.96039604 0.99655172 0.92405063 0.98293515 0.97979798 0.9862543
0.95439739 0.99230769 0.93589744 0.97594502]
mean value: 0.9688533365091594
key: test_recall
value: [1. 1. 1. 0.90909091 1. 0.93939394
0.96875 0.71875 0.96875 0.96875 ]
mean value: 0.9473484848484849
key: train_recall
value: [0.99317406 0.98634812 0.99658703 0.98293515 0.99317406 0.97952218
0.99659864 0.87755102 0.99319728 0.96598639]
mean value: 0.9765073947667804
key: test_roc_auc
value: [0.96969697 0.93939394 0.9375 0.92329545 0.96875 0.95407197
0.93892045 0.84422348 0.95407197 0.984375 ]
mean value: 0.9414299242424242
key: train_roc_auc
value: [0.97610922 0.99146758 0.95747719 0.98296418 0.98638295 0.98295837
0.97440853 0.93536254 0.96246895 0.9710478 ]
mean value: 0.9720647303289917
key: test_jcc
value: [0.94285714 0.89189189 0.89189189 0.85714286 0.94285714 0.91176471
0.88571429 0.6969697 0.91176471 0.96875 ]
mean value: 0.8901604321089616
key: train_jcc
value: [0.95409836 0.9829932 0.92113565 0.96644295 0.97324415 0.96632997
0.9512987 0.87162162 0.92993631 0.94352159]
mean value: 0.946062249446683
MCC on Blind test: 0.74
Accuracy on Blind test: 0.88
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01608205 0.02083969 0.02255607 0.02071667 0.02331257 0.02741241
0.02450824 0.02299356 0.02310205 0.02305198]
mean value: 0.022457528114318847
key: score_time
value: [0.01083064 0.01254272 0.01454592 0.01292634 0.01404285 0.01232314
0.01454258 0.01225185 0.01214027 0.02303076]
mean value: 0.01391770839691162
key: test_mcc
value: [0.94112395 0.85839508 0.77695466 0.84644588 0.63287203 0.87689394
0.65648795 0.87689394 0.87867338 0.96966868]
mean value: 0.8314409482187154
key: train_mcc
value: [0.94242422 0.96928892 0.71494202 0.95598057 0.71292201 0.97955701
0.75595137 0.96265981 0.94043526 0.96595117]
mean value: 0.8900112361644232
key: test_accuracy
value: [0.96969697 0.92424242 0.87692308 0.92307692 0.78461538 0.93846154
0.8 0.93846154 0.93846154 0.98461538]
mean value: 0.9078554778554778
key: train_accuracy
value: [0.97098976 0.98464164 0.83816014 0.97785349 0.83816014 0.98977853
0.8637138 0.98126065 0.9693356 0.98296422]
mean value: 0.9396857975126606
key: test_fscore
value: [0.96875 0.92957746 0.89189189 0.92537313 0.73076923 0.93939394
0.83116883 0.9375 0.93939394 0.98412698]
mean value: 0.9077945415861908
key: train_fscore
value: [0.97053726 0.98461538 0.86049927 0.97807757 0.80730223 0.98976109
0.88023952 0.98145025 0.97029703 0.98293515]
mean value: 0.9405714764352172
key: test_precision
value: [1. 0.86842105 0.80487805 0.91176471 1. 0.93939394
0.71111111 0.9375 0.91176471 1. ]
mean value: 0.9084833563681823
key: train_precision
value: [0.98591549 0.98630137 0.75515464 0.96666667 0.995 0.98976109
0.78609626 0.97324415 0.94230769 0.98630137]
mean value: 0.9366748726825244
key: test_recall
value: [0.93939394 1. 1. 0.93939394 0.57575758 0.93939394
1. 0.9375 0.96875 0.96875 ]
mean value: 0.9268939393939394
key: train_recall
value: [0.9556314 0.98293515 1. 0.98976109 0.67918089 0.98976109
1. 0.98979592 1. 0.97959184]
mean value: 0.956665737967542
key: test_roc_auc
value: [0.96969697 0.92424242 0.875 0.92282197 0.78787879 0.93844697
0.8030303 0.93844697 0.93892045 0.984375 ]
mean value: 0.9082859848484849
key: train_roc_auc
value: [0.97098976 0.98464164 0.83843537 0.97787374 0.83788976 0.98977851
0.86348123 0.98124608 0.96928328 0.98296998]
mean value: 0.9396589352464535
key: test_jcc
value: [0.93939394 0.86842105 0.80487805 0.86111111 0.57575758 0.88571429
0.71111111 0.88235294 0.88571429 0.96875 ]
mean value: 0.8383204351390846
key: train_jcc
value: [0.94276094 0.96969697 0.75515464 0.95709571 0.67687075 0.97972973
0.78609626 0.96357616 0.94230769 0.96644295]
mean value: 0.8939731800185893
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.26672578 0.26924062 0.26172447 0.25810003 0.25932431 0.26393604
0.25689721 0.257658 0.26804256 0.26593328]
mean value: 0.2627582311630249
key: score_time
value: [0.0161252 0.0157783 0.01563334 0.01572132 0.01619577 0.01593256
0.0156095 0.01570439 0.01582265 0.01575351]
mean value: 0.015827655792236328
key: test_mcc
value: [1. 0.88531564 0.90814394 0.90805728 1. 0.96969697
0.96966868 0.96969697 0.96966868 1. ]
mean value: 0.9580248163606737
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.95384615 0.95384615 1. 0.98461538
0.98461538 0.98461538 0.98461538 1. ]
mean value: 0.9785547785547786
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.95384615 0.95522388 1. 0.98461538
0.98412698 0.98461538 0.98412698 1. ]
mean value: 0.9789411914785049
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.96875 0.94117647 1. 1.
1. 0.96969697 1. 1. ]
mean value: 0.9771515332177096
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.93939394 0.96969697 1. 0.96969697
0.96875 1. 0.96875 1. ]
mean value: 0.9816287878787879
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.95407197 0.95359848 1. 0.98484848
0.984375 0.98484848 0.984375 1. ]
mean value: 0.9785511363636363
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.91176471 0.91428571 1. 0.96969697
0.96875 0.96969697 0.96875 1. ]
mean value: 0.9594836251453899
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.11414218 0.11675215 0.06662536 0.08681798 0.09918189 0.10906219
0.08984184 0.11081529 0.09279585 0.09611773]
mean value: 0.09821524620056152
key: score_time
value: [0.03923726 0.03169036 0.019665 0.02797008 0.03503919 0.03238821
0.02360916 0.03317976 0.03165078 0.01983523]
mean value: 0.02942650318145752
key: test_mcc
value: [1. 0.88531564 0.93844697 0.81706198 0.93844697 1.
0.93844697 0.90814394 0.96966868 0.96969697]
mean value: 0.9365228119413505
key: train_mcc
value: [0.99659284 0.9931972 0.9863944 0.99659864 0.9965986 0.99318567
0.99318567 0.99318567 0.98639408 0.98637134]
mean value: 0.9921704109887847
key: test_accuracy
value: [1. 0.93939394 0.96923077 0.90769231 0.96923077 1.
0.96923077 0.95384615 0.98461538 0.98461538]
mean value: 0.9677855477855478
key: train_accuracy
value: [0.99829352 0.99658703 0.99318569 0.99829642 0.99829642 0.99659284
0.99659284 0.99659284 0.99318569 0.99318569]
mean value: 0.9960808995819549
key: test_fscore
value: [1. 0.94285714 0.96969697 0.90625 0.96969697 1.
0.96875 0.95384615 0.98412698 0.98461538]
mean value: 0.9679839604839605
key: train_fscore
value: [0.9982906 0.99659864 0.99319728 0.99829642 0.9982906 0.99658703
0.99659864 0.99659864 0.99322034 0.99319728]
mean value: 0.9960875464958671
key: test_precision
value: [1. 0.89189189 0.96969697 0.93548387 0.96969697 1.
0.96875 0.93939394 1. 0.96969697]
mean value: 0.9644610611344482
key: train_precision
value: [1. 0.99322034 0.98983051 0.99659864 1. 0.99658703
0.99659864 0.99659864 0.98986486 0.99319728]
mean value: 0.9952495940318127
key: test_recall
value: [1. 1. 0.96969697 0.87878788 0.96969697 1.
0.96875 0.96875 0.96875 1. ]
mean value: 0.9724431818181818
key: train_recall
value: [0.99658703 1. 0.99658703 1. 0.99658703 0.99658703
0.99659864 0.99659864 0.99659864 0.99319728]
mean value: 0.9969341320145806
key: test_roc_auc
value: [1. 0.93939394 0.96922348 0.90814394 0.96922348 1.
0.96922348 0.95407197 0.984375 0.98484848]
mean value: 0.9678503787878788
key: train_roc_auc
value: [0.99829352 0.99658703 0.99319147 0.99829932 0.99829352 0.99659284
0.99659284 0.99659284 0.99317987 0.99318567]
mean value: 0.9960808896937614
key: test_jcc
value: [1. 0.89189189 0.94117647 0.82857143 0.94117647 1.
0.93939394 0.91176471 0.96875 0.96969697]
mean value: 0.9392421876613053
key: train_jcc
value: [0.99658703 0.99322034 0.98648649 0.99659864 0.99658703 0.99319728
0.99322034 0.99322034 0.98653199 0.98648649]
mean value: 0.9922135956254906
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.19741392 0.30958152 0.31759167 0.32844281 0.26167846 0.33871961
0.36064386 0.29585671 0.40997648 0.32770634]
mean value: 0.3147611379623413
key: score_time
value: [0.01713848 0.02869463 0.02886558 0.02881336 0.02947807 0.02899432
0.02927995 0.05242538 0.04720736 0.02807188]
mean value: 0.03189690113067627
key: test_mcc
value: [0.75757576 0.81818182 0.63068182 0.76761091 0.78763191 0.63222777
0.62588014 0.87844611 0.78822732 0.81706198]
mean value: 0.7503525533343659
key: train_mcc
value: [0.9761149 0.97271891 0.96595038 0.96934096 0.97957952 0.97957999
0.96934096 0.97274268 0.96934132 0.96595117]
mean value: 0.9720660801372744
key: test_accuracy
value: [0.87878788 0.90909091 0.81538462 0.87692308 0.89230769 0.81538462
0.8 0.93846154 0.89230769 0.90769231]
mean value: 0.8726340326340326
key: train_accuracy
value: [0.98805461 0.98634812 0.98296422 0.9846678 0.98977853 0.98977853
0.9846678 0.98637138 0.9846678 0.98296422]
mean value: 0.9860263037019379
key: test_fscore
value: [0.87878788 0.90909091 0.81818182 0.86666667 0.89855072 0.8125
0.82191781 0.93548387 0.89552239 0.90909091]
mean value: 0.8745792973702484
key: train_fscore
value: [0.98803419 0.98630137 0.98287671 0.98461538 0.98972603 0.98979592
0.98471986 0.98639456 0.9846678 0.98293515]
mean value: 0.9860066978574287
key: test_precision
value: [0.87878788 0.90909091 0.81818182 0.96296296 0.86111111 0.83870968
0.73170732 0.96666667 0.85714286 0.88235294]
mean value: 0.87067141396132
key: train_precision
value: [0.98972603 0.98969072 0.9862543 0.98630137 0.99312715 0.98644068
0.98305085 0.98639456 0.98634812 0.98630137]
mean value: 0.9873635138185494
key: test_recall
value: [0.87878788 0.90909091 0.81818182 0.78787879 0.93939394 0.78787879
0.9375 0.90625 0.9375 0.9375 ]
mean value: 0.8839962121212122
key: train_recall
value: [0.98634812 0.98293515 0.97952218 0.98293515 0.98634812 0.99317406
0.98639456 0.98639456 0.9829932 0.97959184]
mean value: 0.9846636948294676
key: test_roc_auc
value: [0.87878788 0.90909091 0.81534091 0.87831439 0.89157197 0.81581439
0.80208333 0.93797348 0.89299242 0.90814394]
mean value: 0.8730113636363637
key: train_roc_auc
value: [0.98805461 0.98634812 0.98295837 0.98466486 0.9897727 0.98978431
0.98466486 0.98637134 0.98467066 0.98296998]
mean value: 0.9860259803580135
key: test_jcc
value: [0.78378378 0.83333333 0.69230769 0.76470588 0.81578947 0.68421053
0.69767442 0.87878788 0.81081081 0.83333333]
mean value: 0.7794737133314424
key: train_jcc
value: [0.97635135 0.97297297 0.96632997 0.96969697 0.97966102 0.97979798
0.96989967 0.97315436 0.96979866 0.96644295]
mean value: 0.9724105895804595
MCC on Blind test: 0.64
Accuracy on Blind test: 0.88
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.08177519 1.07353163 1.07433748 1.08860278 1.09842777 1.1075387
1.09151793 1.08468246 1.08263707 1.08584714]
mean value: 1.0868898153305053
key: score_time
value: [0.00959563 0.00942159 0.00925279 0.00983596 0.00952816 0.01010299
0.01016855 0.00941157 0.00962877 0.01072001]
mean value: 0.009766602516174316
key: test_mcc
value: [1. 0.88531564 0.93844697 0.90805728 0.93844697 1.
0.96966868 0.90814394 0.96966868 0.96969697]
mean value: 0.9487445133303707
key: train_mcc
value: [0.99659284 1. 1. 1. 0.9965986 0.9965986
1. 1. 1. 1. ]
mean value: 0.9989790035143291
key: test_accuracy
value: [1. 0.93939394 0.96923077 0.95384615 0.96923077 1.
0.98461538 0.95384615 0.98461538 0.98461538]
mean value: 0.973939393939394
key: train_accuracy
value: [0.99829352 1. 1. 1. 0.99829642 0.99829642
1. 1. 1. 1. ]
mean value: 0.9994886360332809
key: test_fscore
value: [1. 0.94285714 0.96969697 0.95522388 0.96969697 1.
0.98412698 0.95384615 0.98412698 0.98461538]
mean value: 0.9744190469563604
key: train_fscore
value: [0.9982906 1. 1. 1. 0.9982906 0.9982906 1.
1. 1. 1. ]
mean value: 0.9994871794871795
key: test_precision
value: [1. 0.89189189 0.96969697 0.94117647 0.96969697 1.
1. 0.93939394 1. 0.96969697]
mean value: 0.9681553210964976
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.96969697 0.96969697 0.96969697 1.
0.96875 0.96875 0.96875 1. ]
mean value: 0.9815340909090909
key: train_recall
value: [0.99658703 1. 1. 1. 0.99658703 0.99658703
1. 1. 1. 1. ]
mean value: 0.998976109215017
key: test_roc_auc
value: [1. 0.93939394 0.96922348 0.95359848 0.96922348 1.
0.984375 0.95407197 0.984375 0.98484848]
mean value: 0.9739109848484848
key: train_roc_auc
value: [0.99829352 1. 1. 1. 0.99829352 0.99829352
1. 1. 1. 1. ]
mean value: 0.9994880546075086
key: test_jcc
value: [1. 0.89189189 0.94117647 0.91428571 0.94117647 1.
0.96875 0.91176471 0.96875 0.96969697]
mean value: 0.9507492222933399
key: train_jcc
value: [0.99658703 1. 1. 1. 0.99658703 0.99658703
1. 1. 1. 1. ]
mean value: 0.998976109215017
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03513765 0.03600693 0.06509256 0.03364587 0.04783249 0.04859257
0.04702735 0.05344486 0.06726575 0.03752756]
mean value: 0.04715735912322998
key: score_time
value: [0.01282334 0.01310682 0.01312494 0.01919127 0.01749039 0.01724458
0.02123213 0.01419568 0.01420665 0.01502204]
mean value: 0.01576378345489502
key: test_mcc
value: [0.88040627 1. 0.93844697 0.90805728 0.96966868 0.94017476
0.84659091 0.90814394 0.87844611 0.96966868]
mean value: 0.9239603602279465
key: train_mcc
value: [0.99659284 0.9830783 0.97976246 0.98646327 0.98646327 0.98983039
0.98978419 0.98646265 0.98301523 0.98310636]
mean value: 0.9864558957440082
key: test_accuracy
value: [0.93939394 1. 0.96923077 0.95384615 0.98461538 0.96923077
0.92307692 0.95384615 0.93846154 0.98461538]
mean value: 0.9616317016317016
key: train_accuracy
value: [0.99829352 0.99146758 0.98977853 0.99318569 0.99318569 0.99488927
0.99488927 0.99318569 0.99148211 0.99148211]
mean value: 0.9931839456715759
key: test_fscore
value: [0.94117647 1. 0.96969697 0.95522388 0.98507463 0.97058824
0.92307692 0.95384615 0.93548387 0.98412698]
mean value: 0.9618294115059812
key: train_fscore
value: [0.99829642 0.99153976 0.98986486 0.99322034 0.99322034 0.99490662
0.99490662 0.99324324 0.99153976 0.9915683 ]
mean value: 0.99323062743685
key: test_precision
value: [0.91428571 1. 0.96969697 0.94117647 0.97058824 0.94285714
0.90909091 0.93939394 0.96666667 1. ]
mean value: 0.9553756047873695
key: train_precision
value: [0.99659864 0.98322148 0.97993311 0.98653199 0.98653199 0.98986486
0.99322034 0.98657718 0.98653199 0.98327759]
mean value: 0.9872289162958916
key: test_recall
value: [0.96969697 1. 0.96969697 0.96969697 1. 1.
0.9375 0.96875 0.90625 0.96875 ]
mean value: 0.9690340909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1.
0.99659864 1. 0.99659864 1. ]
mean value: 0.9993197278911564
key: test_roc_auc
value: [0.93939394 1. 0.96922348 0.95359848 0.984375 0.96875
0.92329545 0.95407197 0.93797348 0.984375 ]
mean value: 0.9615056818181819
key: train_roc_auc
value: [0.99829352 0.99146758 0.98979592 0.99319728 0.99319728 0.99489796
0.99488635 0.99317406 0.99147338 0.99146758]
mean value: 0.9931850897355529
key: test_jcc
value: [0.88888889 1. 0.94117647 0.91428571 0.97058824 0.94285714
0.85714286 0.91176471 0.87878788 0.96875 ]
mean value: 0.9274241893727188
key: train_jcc
value: [0.99659864 0.98322148 0.97993311 0.98653199 0.98653199 0.98986486
0.98986486 0.98657718 0.98322148 0.98327759]
mean value: 0.986562317881881
MCC on Blind test: 0.34
Accuracy on Blind test: 0.82
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03119612 0.04447436 0.03461266 0.03525758 0.03817344 0.03466368
0.0472641 0.03969932 0.04068828 0.04268384]
mean value: 0.03887133598327637
key: score_time
value: [0.03358865 0.0293026 0.01909995 0.02525187 0.02937794 0.01904321
0.01923108 0.01922727 0.01910567 0.01917934]
mean value: 0.02324075698852539
key: test_mcc
value: [1. 0.88531564 0.87844611 0.90814394 0.91144345 0.93844697
0.90805728 0.90814394 0.94017476 0.96966868]
mean value: 0.9247840769518721
key: train_mcc
value: [0.95230718 0.9761149 0.96938669 0.96266197 0.96266197 0.95575756
0.95238704 0.95913582 0.95920216 0.95575602]
mean value: 0.9605371289558845
key: test_accuracy
value: [1. 0.93939394 0.93846154 0.95384615 0.95384615 0.96923077
0.95384615 0.95384615 0.96923077 0.98461538]
mean value: 0.9616317016317016
key: train_accuracy
value: [0.97610922 0.98805461 0.9846678 0.98126065 0.98126065 0.97785349
0.97614991 0.97955707 0.97955707 0.97785349]
mean value: 0.9802323958811798
key: test_fscore
value: [1. 0.94285714 0.94117647 0.95384615 0.95652174 0.96969697
0.95238095 0.95384615 0.96774194 0.98412698]
mean value: 0.9622194501956898
key: train_fscore
value: [0.97627119 0.98807496 0.98471986 0.98138748 0.98138748 0.97792869
0.97635135 0.97966102 0.97972973 0.97800338]
mean value: 0.9803515140551106
key: test_precision
value: [1. 0.89189189 0.91428571 0.96875 0.91666667 0.96969697
0.96774194 0.93939394 1. 1. ]
mean value: 0.9568427117419053
key: train_precision
value: [0.96969697 0.98639456 0.97972973 0.97315436 0.97315436 0.97297297
0.96979866 0.97635135 0.97315436 0.97306397]
mean value: 0.9747471299604569
key: test_recall
value: [1. 1. 0.96969697 0.93939394 1. 0.96969697
0.9375 0.96875 0.9375 0.96875 ]
mean value: 0.9691287878787879
key: train_recall
value: [0.98293515 0.98976109 0.98976109 0.98976109 0.98976109 0.98293515
0.9829932 0.9829932 0.98639456 0.9829932 ]
mean value: 0.9860288825427782
key: test_roc_auc
value: [1. 0.93939394 0.93797348 0.95407197 0.953125 0.96922348
0.95359848 0.95407197 0.96875 0.984375 ]
mean value: 0.9614583333333333
key: train_roc_auc
value: [0.97610922 0.98805461 0.98467646 0.9812751 0.9812751 0.97786213
0.97613824 0.97955121 0.9795454 0.97784472]
mean value: 0.9802332195676906
key: test_jcc
value: [1. 0.89189189 0.88888889 0.91176471 0.91666667 0.94117647
0.90909091 0.91176471 0.9375 0.96875 ]
mean value: 0.9277494238891297
key: train_jcc
value: [0.95364238 0.97643098 0.96989967 0.96345515 0.96345515 0.95681063
0.95379538 0.96013289 0.9602649 0.95695364]
mean value: 0.9614840769271095
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25791717 0.30864501 0.47040224 0.33794689 0.33028293 0.31463742
0.33145475 0.31620097 0.350003 0.41688848]
mean value: 0.3434378862380981
key: score_time
value: [0.01945615 0.02283406 0.01909637 0.01916385 0.0190537 0.01913309
0.0194664 0.01907849 0.01902366 0.0190413 ]
mean value: 0.019534707069396973
key: test_mcc
value: [1. 0.88531564 0.87844611 0.90814394 0.91144345 0.93844697
0.90805728 0.87867338 0.94017476 0.96966868]
mean value: 0.9218370210629971
key: train_mcc
value: [0.95230718 0.97952218 0.96938669 0.96266197 0.96266197 0.96934132
0.95238704 0.96595038 0.95920216 0.95575602]
mean value: 0.9629176905830358
key: test_accuracy
value: [1. 0.93939394 0.93846154 0.95384615 0.95384615 0.96923077
0.95384615 0.93846154 0.96923077 0.98461538]
mean value: 0.9600932400932402
key: train_accuracy
value: [0.97610922 0.98976109 0.9846678 0.98126065 0.98126065 0.9846678
0.97614991 0.98296422 0.97955707 0.97785349]
mean value: 0.9814251908530097
key: test_fscore
value: [1. 0.94285714 0.94117647 0.95384615 0.95652174 0.96969697
0.95238095 0.93939394 0.96774194 0.98412698]
mean value: 0.9607742287504684
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.97627119 0.98976109 0.98471986 0.98138748 0.98138748 0.9846678
0.97635135 0.98305085 0.97972973 0.97800338]
mean value: 0.9815330215484706
key: test_precision
value: [1. 0.89189189 0.91428571 0.96875 0.91666667 0.96969697
0.96774194 0.91176471 1. 1. ]
mean value: 0.9540797883907466
key: train_precision
value: [0.96969697 0.98976109 0.97972973 0.97315436 0.97315436 0.9829932
0.96979866 0.97972973 0.97315436 0.97306397]
mean value: 0.9764236436615927
key: test_recall
value: [1. 1. 0.96969697 0.93939394 1. 0.96969697
0.9375 0.96875 0.9375 0.96875 ]
mean value: 0.9691287878787879
key: train_recall
value: [0.98293515 0.98976109 0.98976109 0.98976109 0.98976109 0.98634812
0.9829932 0.98639456 0.98639456 0.9829932 ]
mean value: 0.9867103155255276
key: test_roc_auc
value: [1. 0.93939394 0.93797348 0.95407197 0.953125 0.96922348
0.95359848 0.93892045 0.96875 0.984375 ]
mean value: 0.9599431818181818
key: train_roc_auc
value: [0.97610922 0.98976109 0.98467646 0.9812751 0.9812751 0.98467066
0.97613824 0.98295837 0.9795454 0.97784472]
mean value: 0.9814254370690256
key: test_jcc
value: [1. 0.89189189 0.88888889 0.91176471 0.91666667 0.94117647
0.90909091 0.88571429 0.9375 0.96875 ]
mean value: 0.9251443818723231
key: train_jcc
value: [0.95364238 0.97972973 0.96989967 0.96345515 0.96345515 0.96979866
0.95379538 0.96666667 0.9602649 0.95695364]
mean value: 0.9637661325359951
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03635168 0.04008698 0.03932405 0.0380702 0.03882837 0.03954506
0.04933882 0.04907322 0.03866506 0.05903292]
mean value: 0.04283163547515869
key: score_time
value: [0.01420403 0.01426244 0.01425099 0.01448369 0.01443338 0.0146153
0.01465511 0.01483965 0.01471305 0.02132845]
mean value: 0.015178608894348144
key: test_mcc
value: [0.90950859 0.85201287 0.84644588 0.87867338 0.91144345 0.84659091
0.91168461 0.84659091 0.78763191 0.93844697]
mean value: 0.8729029478333615
key: train_mcc
value: [0.92569136 0.94894364 0.94583929 0.92504237 0.92889005 0.91487015
0.93886511 0.92506472 0.92512656 0.92189428]
mean value: 0.9300227538742016
key: test_accuracy
value: [0.95454545 0.92424242 0.92307692 0.93846154 0.95384615 0.92307692
0.95384615 0.92307692 0.89230769 0.96923077]
mean value: 0.9355710955710956
key: train_accuracy
value: [0.96245734 0.97440273 0.97274276 0.96252129 0.96422487 0.95741056
0.9693356 0.96252129 0.96252129 0.96081772]
mean value: 0.9648955468600101
key: test_fscore
value: [0.95522388 0.92753623 0.92537313 0.9375 0.95652174 0.92307692
0.95522388 0.92307692 0.8852459 0.96875 ]
mean value: 0.9357528614330072
key: train_fscore
value: [0.9632107 0.97461929 0.97306397 0.96245734 0.96470588 0.95755518
0.96969697 0.96245734 0.96283784 0.96134454]
mean value: 0.9651949046484256
key: test_precision
value: [0.94117647 0.88888889 0.91176471 0.96774194 0.91666667 0.9375
0.91428571 0.90909091 0.93103448 0.96875 ]
mean value: 0.9286899773645259
key: train_precision
value: [0.9442623 0.96644295 0.96013289 0.96245734 0.95033113 0.9527027
0.96 0.96575342 0.95637584 0.95016611]
mean value: 0.9568624681422546
key: test_recall
value: [0.96969697 0.96969697 0.93939394 0.90909091 1. 0.90909091
1. 0.9375 0.84375 0.96875 ]
mean value: 0.9446969696969697
key: train_recall
value: [0.98293515 0.98293515 0.98634812 0.96245734 0.97952218 0.96245734
0.97959184 0.95918367 0.96938776 0.97278912]
mean value: 0.973760767105477
key: test_roc_auc
value: [0.95454545 0.92424242 0.92282197 0.93892045 0.953125 0.92329545
0.95454545 0.92329545 0.89157197 0.96922348]
mean value: 0.9355587121212121
key: train_roc_auc
value: [0.96245734 0.97440273 0.9727659 0.96252119 0.96425089 0.95741915
0.9693181 0.96252699 0.96250958 0.96079729]
mean value: 0.9648969143971582
key: test_jcc
value: [0.91428571 0.86486486 0.86111111 0.88235294 0.91666667 0.85714286
0.91428571 0.85714286 0.79411765 0.93939394]
mean value: 0.8801364313129019
key: train_jcc
value: [0.92903226 0.95049505 0.94754098 0.92763158 0.93181818 0.91856678
0.94117647 0.92763158 0.92833876 0.92556634]
mean value: 0.9327797981978533
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.05838704 0.86052942 1.04595804 1.11087394 1.15152407 1.06360507
1.18267727 1.06228065 1.11125398 1.15308499]
mean value: 1.0800174474716187
key: score_time
value: [0.01456332 0.01477814 0.01478934 0.01489806 0.01752996 0.01242924
0.01497817 0.01905251 0.01568937 0.0210464 ]
mean value: 0.015975451469421385
key: test_mcc
value: [0.90950859 0.85201287 0.94017476 0.90805728 0.94017476 0.96966868
0.96969697 0.88382395 0.93844697 0.96969697]
mean value: 0.9281261797504913
key: train_mcc
value: [0.9761149 0.9863711 1. 0.96592835 0.96934132 0.97274268
0.96934096 0.97274268 0.97276495 0.97274268]
mean value: 0.9758089630087908
key: test_accuracy
value: [0.95454545 0.92424242 0.96923077 0.95384615 0.96923077 0.98461538
0.98461538 0.93846154 0.96923077 0.98461538]
mean value: 0.9632634032634033
key: train_accuracy
value: [0.98805461 0.99317406 1. 0.98296422 0.9846678 0.98637138
0.9846678 0.98637138 0.98637138 0.98637138]
mean value: 0.9879014018175369
key: test_fscore
value: [0.95522388 0.92753623 0.97058824 0.95522388 0.97058824 0.98507463
0.98461538 0.94117647 0.96875 0.98461538]
mean value: 0.9643392330350999
key: train_fscore
value: [0.98807496 0.99315068 1. 0.98293515 0.9846678 0.98634812
0.98471986 0.98639456 0.98644068 0.98639456]
mean value: 0.987912637896652
key: test_precision
value: [0.94117647 0.88888889 0.94285714 0.94117647 0.94285714 0.97058824
0.96969697 0.88888889 0.96875 0.96969697]
mean value: 0.9424577179356591
key: train_precision
value: [0.98639456 0.99656357 1. 0.98293515 0.9829932 0.98634812
0.98305085 0.98639456 0.98310811 0.98639456]
mean value: 0.9874182676647708
key: test_recall
value: [0.96969697 0.96969697 1. 0.96969697 1. 1.
1. 1. 0.96875 1. ]
mean value: 0.9877840909090909
key: train_recall
value: [0.98976109 0.98976109 1. 0.98293515 0.98634812 0.98634812
0.98639456 0.98639456 0.98979592 0.98639456]
mean value: 0.9884133175454481
key: test_roc_auc
value: [0.95454545 0.92424242 0.96875 0.95359848 0.96875 0.984375
0.98484848 0.93939394 0.96922348 0.98484848]
mean value: 0.9632575757575758
key: train_roc_auc
value: [0.98805461 0.99317406 1. 0.98296418 0.98467066 0.98637134
0.98466486 0.98637134 0.98636554 0.98637134]
mean value: 0.9879007917160039
key: test_jcc
value: [0.91428571 0.86486486 0.94285714 0.91428571 0.94285714 0.97058824
0.96969697 0.88888889 0.93939394 0.96969697]
mean value: 0.9317415582121464
key: train_jcc
value: [0.97643098 0.98639456 1. 0.96644295 0.96979866 0.97306397
0.96989967 0.97315436 0.97324415 0.97315436]
mean value: 0.9761583655597579
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01527762 0.01133418 0.01078105 0.01043081 0.01038504 0.01038885
0.01011038 0.01063275 0.01037979 0.0102396 ]
mean value: 0.010996007919311523
key: score_time
value: [0.01235747 0.00942874 0.00928879 0.00893927 0.00882173 0.00888205
0.0088284 0.00891542 0.00892353 0.00884128]
mean value: 0.009322667121887207
key: test_mcc
value: [0.48507125 0.62017367 0.63620086 0.63068182 0.75498882 0.57061637
0.67840053 0.63153153 0.84659091 0.60621087]
mean value: 0.646046662624106
key: train_mcc
value: [0.68269327 0.72405993 0.70759582 0.6797265 0.67306421 0.67743268
0.69679344 0.69335516 0.67985743 0.64233052]
mean value: 0.6856908963219008
key: test_accuracy
value: [0.74242424 0.8030303 0.81538462 0.81538462 0.87692308 0.78461538
0.83076923 0.81538462 0.92307692 0.8 ]
mean value: 0.8206993006993007
key: train_accuracy
value: [0.84129693 0.86177474 0.85349233 0.83986371 0.83645656 0.83475298
0.8483816 0.84667802 0.83986371 0.82112436]
mean value: 0.8423684960259549
key: test_fscore
value: [0.73846154 0.82191781 0.80645161 0.81818182 0.88235294 0.78125
0.84507042 0.80645161 0.92307692 0.77966102]
mean value: 0.8202875694406744
key: train_fscore
value: [0.83993115 0.86432161 0.85618729 0.83959044 0.83783784 0.8207024
0.84940778 0.84693878 0.84175084 0.82293423]
mean value: 0.8419602370069587
key: test_precision
value: [0.75 0.75 0.86206897 0.81818182 0.85714286 0.80645161
0.76923077 0.83333333 0.90909091 0.85185185]
mean value: 0.8207352117252006
key: train_precision
value: [0.84722222 0.84868421 0.83934426 0.83959044 0.82943144 0.89516129
0.84511785 0.84693878 0.83333333 0.81605351]
mean value: 0.8440877332846366
key: test_recall
value: [0.72727273 0.90909091 0.75757576 0.81818182 0.90909091 0.75757576
0.9375 0.78125 0.9375 0.71875 ]
mean value: 0.8253787878787879
key: train_recall
value: [0.83276451 0.88054608 0.87372014 0.83959044 0.84641638 0.75767918
0.8537415 0.84693878 0.85034014 0.82993197]
mean value: 0.8411669104501869
key: test_roc_auc
value: [0.74242424 0.8030303 0.81628788 0.81534091 0.87642045 0.78503788
0.83238636 0.81486742 0.92329545 0.79876894]
mean value: 0.8207859848484849
key: train_roc_auc
value: [0.84129693 0.86177474 0.85352673 0.83986325 0.8364735 0.8346219
0.84837245 0.84667758 0.83984584 0.82110933]
mean value: 0.8423562257667572
key: test_jcc
value: [0.58536585 0.69767442 0.67567568 0.69230769 0.78947368 0.64102564
0.73170732 0.67567568 0.85714286 0.63888889]
mean value: 0.6984937704263315
key: train_jcc
value: [0.72403561 0.76106195 0.74853801 0.72352941 0.72093023 0.69592476
0.73823529 0.73451327 0.72674419 0.6991404 ]
mean value: 0.7272653131766867
MCC on Blind test: 0.47
Accuracy on Blind test: 0.79
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01057482 0.01148009 0.01069951 0.01047349 0.01045203 0.01067877
0.0105927 0.01061225 0.01050448 0.01047277]
mean value: 0.010654091835021973
key: score_time
value: [0.00899243 0.00903797 0.008986 0.00885749 0.00882101 0.00897503
0.00889325 0.00890899 0.00888324 0.0088706 ]
mean value: 0.008922600746154785
key: test_mcc
value: [0.42919754 0.53099079 0.58027158 0.47727273 0.72348485 0.54131274
0.63068182 0.63153153 0.64942422 0.47810304]
mean value: 0.5672270829566963
key: train_mcc
value: [0.58200069 0.60001567 0.58515758 0.56541828 0.58185441 0.60844172
0.56880028 0.60266953 0.58602719 0.60241221]
mean value: 0.5882797582957422
key: test_accuracy
value: [0.71212121 0.75757576 0.78461538 0.73846154 0.86153846 0.76923077
0.81538462 0.81538462 0.81538462 0.73846154]
mean value: 0.7808158508158508
key: train_accuracy
value: [0.79010239 0.79863481 0.79216354 0.78194208 0.79045997 0.80408859
0.78364566 0.80068143 0.79216354 0.80068143]
mean value: 0.7934563436458885
key: test_fscore
value: [0.68852459 0.78378378 0.76666667 0.73846154 0.86153846 0.76190476
0.8125 0.80645161 0.78571429 0.72131148]
mean value: 0.7726857176546494
key: train_fscore
value: [0.78152753 0.78853047 0.78596491 0.77304965 0.78383128 0.80069324
0.77601411 0.7943761 0.7844523 0.79509632]
mean value: 0.7863535905385026
key: test_precision
value: [0.75 0.70731707 0.85185185 0.75 0.875 0.8
0.8125 0.83333333 0.91666667 0.75862069]
mean value: 0.8055289614677756
key: train_precision
value: [0.81481481 0.83018868 0.80866426 0.80442804 0.80797101 0.81338028
0.80586081 0.82181818 0.81617647 0.81949458]
mean value: 0.8142797137556002
key: test_recall
value: [0.63636364 0.87878788 0.6969697 0.72727273 0.84848485 0.72727273
0.8125 0.78125 0.6875 0.6875 ]
mean value: 0.7483901515151515
key: train_recall
value: [0.75085324 0.75085324 0.76450512 0.7440273 0.76109215 0.7883959
0.74829932 0.76870748 0.75510204 0.77210884]
mean value: 0.7603944649532167
key: test_roc_auc
value: [0.71212121 0.75757576 0.78598485 0.73863636 0.86174242 0.76988636
0.81534091 0.81486742 0.81344697 0.73768939]
mean value: 0.7807291666666667
key: train_roc_auc
value: [0.79010239 0.79863481 0.79211651 0.7818776 0.79041002 0.8040619
0.78370597 0.80073599 0.79222679 0.80073019]
mean value: 0.7934602168512456
key: test_jcc
value: [0.525 0.64444444 0.62162162 0.58536585 0.75675676 0.61538462
0.68421053 0.67567568 0.64705882 0.56410256]
mean value: 0.6319620881489416
key: train_jcc
value: [0.64139942 0.65088757 0.64739884 0.6300578 0.64450867 0.66763006
0.63400576 0.65889213 0.64534884 0.65988372]
mean value: 0.6480012816704841
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00982594 0.01073241 0.00973868 0.01103973 0.01097131 0.01004457
0.01003647 0.01040053 0.01030421 0.01033473]
mean value: 0.010342860221862793
key: score_time
value: [0.01514554 0.01278162 0.01276278 0.01355863 0.01304555 0.01275563
0.01325583 0.01311755 0.01306009 0.01343203]
mean value: 0.013291525840759277
key: test_mcc
value: [0.45538256 0.68219104 0.44739357 0.64071161 0.45812857 0.48131798
0.58873983 0.70516447 0.57061637 0.53838887]
mean value: 0.5568034847168876
key: train_mcc
value: [0.77368438 0.78627716 0.77867359 0.76093067 0.77108643 0.78072643
0.80246789 0.7985738 0.75589317 0.78313619]
mean value: 0.7791449713062735
key: test_accuracy
value: [0.72727273 0.83333333 0.72307692 0.81538462 0.72307692 0.73846154
0.78461538 0.84615385 0.78461538 0.76923077]
mean value: 0.7745221445221445
key: train_accuracy
value: [0.88225256 0.88737201 0.88586031 0.87393526 0.88074957 0.88415673
0.89608177 0.89437819 0.87223169 0.88756388]
mean value: 0.884458198394102
key: test_fscore
value: [0.73529412 0.84931507 0.71875 0.83333333 0.75675676 0.76056338
0.80555556 0.85714286 0.78787879 0.76190476]
mean value: 0.7866494618993952
key: train_fscore
value: [0.89064976 0.89622642 0.8928 0.884375 0.88924051 0.89341693
0.90393701 0.9022082 0.88262911 0.8952381 ]
mean value: 0.8930721024591308
key: test_precision
value: [0.71428571 0.775 0.74193548 0.76923077 0.68292683 0.71052632
0.725 0.78947368 0.76470588 0.77419355]
mean value: 0.7447278227395782
key: train_precision
value: [0.83136095 0.83090379 0.84036145 0.81556196 0.82890855 0.82608696
0.84164223 0.84117647 0.8173913 0.83928571]
mean value: 0.8312679371325126
key: test_recall
value: [0.75757576 0.93939394 0.6969697 0.90909091 0.84848485 0.81818182
0.90625 0.9375 0.8125 0.75 ]
mean value: 0.837594696969697
key: train_recall
value: [0.95904437 0.97269625 0.95221843 0.96587031 0.95904437 0.97269625
0.97619048 0.97278912 0.95918367 0.95918367]
mean value: 0.9648916904645817
key: test_roc_auc
value: [0.72727273 0.83333333 0.72348485 0.81392045 0.72111742 0.73721591
0.78645833 0.84753788 0.78503788 0.76893939]
mean value: 0.7744318181818182
key: train_roc_auc
value: [0.88225256 0.88737201 0.88597316 0.87409162 0.88088273 0.88430731
0.89594507 0.89424439 0.8720833 0.88744167]
mean value: 0.8844593810220334
key: test_jcc
value: [0.58139535 0.73809524 0.56097561 0.71428571 0.60869565 0.61363636
0.6744186 0.75 0.65 0.61538462]
mean value: 0.6506887146820315
key: train_jcc
value: [0.80285714 0.81196581 0.80635838 0.79271709 0.8005698 0.80736544
0.82471264 0.82183908 0.78991597 0.81034483]
mean value: 0.8068646180934557
MCC on Blind test: 0.31
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03434801 0.03376579 0.03416848 0.03222466 0.03244758 0.03181934
0.03270769 0.0325191 0.03169084 0.03116226]
mean value: 0.032685375213623045
key: score_time
value: [0.0160203 0.01425171 0.0150001 0.01322293 0.01301026 0.01341105
0.01340508 0.01416564 0.01349974 0.01314807]
mean value: 0.013913488388061524
key: test_mcc
value: [0.72760688 0.79708114 0.69223485 0.72649867 0.7935502 0.51508188
0.68964536 0.69810664 0.72322307 0.64071161]
mean value: 0.7003740298360195
key: train_mcc
value: [0.79968344 0.8805512 0.81608462 0.80591418 0.84344123 0.83013357
0.83327919 0.81260956 0.81945993 0.80637046]
mean value: 0.8247527376466051
key: test_accuracy
value: [0.86363636 0.89393939 0.84615385 0.86153846 0.89230769 0.75384615
0.83076923 0.84615385 0.86153846 0.81538462]
mean value: 0.8465268065268066
key: train_accuracy
value: [0.89931741 0.94027304 0.90800681 0.90289608 0.92163543 0.91482112
0.9165247 0.90630324 0.90971039 0.90289608]
mean value: 0.9122384310806961
key: test_fscore
value: [0.86153846 0.90140845 0.84848485 0.85714286 0.90140845 0.77777778
0.84931507 0.85294118 0.85714286 0.79310345]
mean value: 0.8500263396734854
key: train_fscore
value: [0.90183028 0.94017094 0.90721649 0.9035533 0.92068966 0.91610738
0.91764706 0.90662139 0.91032149 0.90121317]
mean value: 0.9125371166685831
key: test_precision
value: [0.875 0.84210526 0.84848485 0.9 0.84210526 0.71794872
0.75609756 0.80555556 0.87096774 0.88461538]
mean value: 0.834288033583139
key: train_precision
value: [0.87987013 0.94178082 0.91349481 0.89597315 0.93031359 0.9009901
0.90697674 0.90508475 0.90572391 0.91872792]
mean value: 0.9098935914566021
key: test_recall
value: [0.84848485 0.96969697 0.84848485 0.81818182 0.96969697 0.84848485
0.96875 0.90625 0.84375 0.71875 ]
mean value: 0.8740530303030303
key: train_recall
value: [0.92491468 0.93856655 0.90102389 0.9112628 0.9112628 0.93174061
0.92857143 0.90816327 0.91496599 0.88435374]
mean value: 0.9154825752826727
key: test_roc_auc
value: [0.86363636 0.89393939 0.84611742 0.86221591 0.89109848 0.75236742
0.83285985 0.84706439 0.86126894 0.81392045]
mean value: 0.8464488636363636
key: train_roc_auc
value: [0.89931741 0.94027304 0.90799494 0.90291031 0.92161779 0.9148499
0.91650414 0.90630006 0.90970142 0.90292772]
mean value: 0.9122396740266072
key: test_jcc
value: [0.75675676 0.82051282 0.73684211 0.75 0.82051282 0.63636364
0.73809524 0.74358974 0.75 0.65714286]
mean value: 0.7409815978237031
key: train_jcc
value: [0.82121212 0.88709677 0.83018868 0.82407407 0.85303514 0.84520124
0.84782609 0.82919255 0.83540373 0.82018927]
mean value: 0.8393419665581484
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.16521049 2.1405015 2.16510344 2.0510726 2.08916759 2.11430049
2.20137405 2.13904119 2.11048341 2.28039932]
mean value: 2.145665407180786
key: score_time
value: [0.01282048 0.01445389 0.02410007 0.01496387 0.01454306 0.01344037
0.01483035 0.01479912 0.01515126 0.0157845 ]
mean value: 0.015488696098327637
key: test_mcc
value: [0.9701425 0.88531564 0.91144345 0.96966868 0.94017476 0.91144345
0.88382395 0.91168461 0.91168461 1. ]
mean value: 0.9295381660328197
key: train_mcc
value: [0.99659284 0.99659284 1. 0.99659864 0.99659864 0.99659864
0.9965986 0.9965986 0.9965986 0.9965986 ]
mean value: 0.996937598865393
key: test_accuracy
value: [0.98484848 0.93939394 0.95384615 0.98461538 0.96923077 0.95384615
0.93846154 0.95384615 0.95384615 1. ]
mean value: 0.9631934731934733
key: train_accuracy
value: [0.99829352 0.99829352 1. 0.99829642 0.99829642 0.99829642
0.99829642 0.99829642 0.99829642 0.99829642]
mean value: 0.9984661988127286
key: test_fscore
value: [0.98507463 0.94285714 0.95652174 0.98507463 0.97058824 0.95652174
0.94117647 0.95522388 0.95522388 1. ]
mean value: 0.9648262341925739
key: train_fscore
value: [0.99829642 0.99829642 1. 0.99829642 0.99829642 0.99829642
0.99830221 0.99830221 0.99830221 0.99830221]
mean value: 0.9984690940959036
key: test_precision
value: [0.97058824 0.89189189 0.91666667 0.97058824 0.94285714 0.91666667
0.88888889 0.91428571 0.91428571 1. ]
mean value: 0.932671915613092
key: train_precision
value: [0.99659864 0.99659864 1. 0.99659864 0.99659864 0.99659864
0.99661017 0.99661017 0.99661017 0.99661017]
mean value: 0.9969433875245013
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.93939394 0.953125 0.984375 0.96875 0.953125
0.93939394 0.95454545 0.95454545 1. ]
mean value: 0.9632102272727273
key: train_roc_auc
value: [0.99829352 0.99829352 1. 0.99829932 0.99829932 0.99829932
0.99829352 0.99829352 0.99829352 0.99829352]
mean value: 0.9984659051333844
key: test_jcc
value: [0.97058824 0.89189189 0.91666667 0.97058824 0.94285714 0.91666667
0.88888889 0.91428571 0.91428571 1. ]
mean value: 0.932671915613092
key: train_jcc
value: [0.99659864 0.99659864 1. 0.99659864 0.99659864 0.99659864
0.99661017 0.99661017 0.99661017 0.99661017]
mean value: 0.9969433875245013
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04018974 0.0260396 0.02741075 0.02671456 0.0259552 0.0270834
0.02738142 0.02613425 0.02657294 0.02714205]
mean value: 0.02806239128112793
key: score_time
value: [0.01224422 0.00977659 0.00987625 0.00879931 0.00890756 0.00887895
0.009238 0.00953794 0.00898623 0.00918031]
mean value: 0.009542536735534669
key: test_mcc
value: [1. 0.88531564 0.94017476 0.96966868 0.96966868 0.94017476
1. 0.96969697 1. 0.96969697]
mean value: 0.9644396455835689
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.96923077 0.98461538 0.98461538 0.96923077
1. 0.98461538 1. 0.98461538]
mean value: 0.9816317016317017
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.97058824 0.98507463 0.98507463 0.97058824
1. 0.98461538 1. 0.98461538]
mean value: 0.9823413636407491
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.94285714 0.97058824 0.97058824 0.94285714
1. 0.96969697 1. 0.96969697]
mean value: 0.9658176587588352
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.96875 0.984375 0.984375 0.96875
1. 0.98484848 1. 0.98484848]
mean value: 0.9815340909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.94285714 0.97058824 0.97058824 0.94285714
1. 0.96969697 1. 0.96969697]
mean value: 0.9658176587588352
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13007593 0.12656021 0.12510514 0.12539434 0.12666392 0.1268909
0.12753057 0.12602592 0.12738371 0.12686872]
mean value: 0.12684993743896483
key: score_time
value: [0.01878166 0.01778889 0.01779699 0.01774406 0.01783943 0.01807499
0.01818442 0.01870155 0.01788616 0.02420855]
mean value: 0.018700671195983887
key: test_mcc
value: [1. 0.88531564 0.91144345 0.96966868 0.96966868 0.96966868
1. 0.96969697 0.96966868 0.96969697]
mean value: 0.9614827760603497
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.95384615 0.98461538 0.98461538 0.98461538
1. 0.98461538 0.98461538 0.98461538]
mean value: 0.9800932400932402
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.95652174 0.98507463 0.98507463 0.98507463
1. 0.98461538 0.98412698 0.98461538]
mean value: 0.9807960515942346
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.91666667 0.97058824 0.97058824 0.97058824
1. 0.96969697 1. 0.96969697]
mean value: 0.9659717203834851
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 0.96875
1. ]
mean value: 0.996875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.953125 0.984375 0.984375 0.984375
1. 0.98484848 0.984375 0.98484848]
mean value: 0.9799715909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.91666667 0.97058824 0.97058824 0.97058824
1. 0.96969697 0.96875 0.96969697]
mean value: 0.9628467203834851
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01043582 0.01042581 0.01088643 0.01115012 0.0105629 0.01053858
0.01073027 0.01055336 0.01113892 0.01059294]
mean value: 0.010701513290405274
key: score_time
value: [0.00881147 0.00887012 0.0090332 0.00886226 0.00879574 0.0087831
0.0088973 0.00887489 0.00885868 0.00885963]
mean value: 0.008864641189575195
key: test_mcc
value: [0.85839508 0.88531564 0.8291562 0.87689394 0.91144345 0.94017476
0.85663571 0.85663571 0.83005736 0.65648795]
mean value: 0.8501195782810751
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92424242 0.93939394 0.90769231 0.93846154 0.95384615 0.96923077
0.92307692 0.92307692 0.90769231 0.8 ]
mean value: 0.9186713286713287
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92957746 0.94285714 0.91666667 0.93939394 0.95652174 0.97058824
0.92753623 0.92753623 0.91428571 0.83116883]
mean value: 0.9256132197353695
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86842105 0.89189189 0.84615385 0.93939394 0.91666667 0.94285714
0.86486486 0.86486486 0.84210526 0.71111111]
mean value: 0.8688330643593801
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.93939394 1. 1.
1. 1. 1. 1. ]
mean value: 0.9939393939393939
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92424242 0.93939394 0.90625 0.93844697 0.953125 0.96875
0.92424242 0.92424242 0.90909091 0.8030303 ]
mean value: 0.9190814393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86842105 0.89189189 0.84615385 0.88571429 0.91666667 0.94285714
0.86486486 0.86486486 0.84210526 0.71111111]
mean value: 0.8634650989914148
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.82
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.83368301 1.80051327 1.82413101 1.82221413 1.8274672 1.83522439
1.82569861 1.82355714 1.82971764 1.85142326]
mean value: 1.8273629665374755
key: score_time
value: [0.09318542 0.0931797 0.09321451 0.09349918 0.09294653 0.09291863
0.09288955 0.09271193 0.09319663 0.09328294]
mean value: 0.09310250282287598
key: test_mcc
value: [1. 0.88531564 0.96966868 0.94017476 0.96966868 0.96966868
1. 0.94028478 1. 0.96969697]
mean value: 0.9644478194983122
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.98461538 0.96923077 0.98461538 0.98461538
1. 0.96923077 1. 0.98461538]
mean value: 0.9816317016317017
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.98507463 0.97058824 0.98507463 0.98507463
1. 0.96969697 1. 0.98461538]
mean value: 0.982298161306063
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.97058824
1. 0.94117647 1. 0.96969697]
mean value: 0.9657387180916592
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.984375 0.96875 0.984375 0.984375
1. 0.96969697 1. 0.98484848]
mean value: 0.9815814393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.97058824
1. 0.94117647 1. 0.96969697]
mean value: 0.9657387180916592
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.96573257 0.98738766 1.00982213 1.02626348 1.00719547 1.04622245
1.01944923 0.99987769 1.07083035 1.11324215]
mean value: 1.0246023178100585
key: score_time
value: [0.23722124 0.24880981 0.24401736 0.19855857 0.25242662 0.22853804
0.16347671 0.2721715 0.25464773 0.13172317]
mean value: 0.2231590747833252
key: test_mcc
value: [1. 0.88531564 0.94017476 0.94017476 0.96966868 1.
0.96969697 0.94028478 1. 0.96969697]
mean value: 0.9615012556379743
key: train_mcc
value: [0.9763879 0.98981298 0.97976246 0.98310733 0.97976246 0.97642854
0.97976106 0.97642665 0.97642665 0.97976106]
mean value: 0.9797637092135919
key: test_accuracy
value: [1. 0.93939394 0.96923077 0.96923077 0.98461538 1.
0.98461538 0.96923077 1. 0.98461538]
mean value: 0.9800932400932401
key: train_accuracy
value: [0.98805461 0.99488055 0.98977853 0.99148211 0.98977853 0.98807496
0.98977853 0.98807496 0.98807496 0.98977853]
mean value: 0.9897756277944776
key: test_fscore
value: [1. 0.94285714 0.97058824 0.97058824 0.98507463 1.
0.98461538 0.96969697 1. 0.98461538]
mean value: 0.9808035979238788
key: train_fscore
value: [0.98819562 0.99490662 0.98986486 0.99153976 0.98986486 0.98819562
0.98989899 0.98823529 0.98823529 0.98989899]
mean value: 0.9898835913297229
key: test_precision
value: [1. 0.89189189 0.94285714 0.94285714 0.97058824 1.
0.96969697 0.94117647 1. 0.96969697]
mean value: 0.962876482288247
key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.97666667 0.98986486 0.97993311 0.98322148 0.97993311 0.97666667
0.98 0.97674419 0.97674419 0.98 ]
mean value: 0.9799774267537075
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.96875 0.96875 0.984375 1.
0.98484848 0.96969697 1. 0.98484848]
mean value: 0.9800662878787879
key: train_roc_auc
value: [0.98805461 0.99488055 0.98979592 0.9914966 0.98979592 0.98809524
0.98976109 0.98805461 0.98805461 0.98976109]
mean value: 0.9897750226370411
key: test_jcc
value: [1. 0.89189189 0.94285714 0.94285714 0.97058824 1.
0.96969697 0.94117647 1. 0.96969697]
mean value: 0.962876482288247
key: train_jcc
value: [0.97666667 0.98986486 0.97993311 0.98322148 0.97993311 0.97666667
0.98 0.97674419 0.97674419 0.98 ]
mean value: 0.9799774267537075
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02750015 0.01196432 0.01203775 0.01166177 0.0112555 0.01293898
0.01182771 0.01210523 0.01166701 0.01209378]
mean value: 0.013505220413208008
key: score_time
value: [0.00998569 0.00978303 0.00998259 0.00981021 0.0090344 0.00994062
0.00977063 0.00989413 0.00978208 0.00990629]
mean value: 0.009788966178894043
key: test_mcc
value: [0.42919754 0.53099079 0.58027158 0.47727273 0.72348485 0.54131274
0.63068182 0.63153153 0.64942422 0.47810304]
mean value: 0.5672270829566963
key: train_mcc
value: [0.58200069 0.60001567 0.58515758 0.56541828 0.58185441 0.60844172
0.56880028 0.60266953 0.58602719 0.60241221]
mean value: 0.5882797582957422
key: test_accuracy
value: [0.71212121 0.75757576 0.78461538 0.73846154 0.86153846 0.76923077
0.81538462 0.81538462 0.81538462 0.73846154]
mean value: 0.7808158508158508
key: train_accuracy
value: [0.79010239 0.79863481 0.79216354 0.78194208 0.79045997 0.80408859
0.78364566 0.80068143 0.79216354 0.80068143]
mean value: 0.7934563436458885
key: test_fscore
value: [0.68852459 0.78378378 0.76666667 0.73846154 0.86153846 0.76190476
0.8125 0.80645161 0.78571429 0.72131148]
mean value: 0.7726857176546494
key: train_fscore
value: [0.78152753 0.78853047 0.78596491 0.77304965 0.78383128 0.80069324
0.77601411 0.7943761 0.7844523 0.79509632]
mean value: 0.7863535905385026
key: test_precision
value: [0.75 0.70731707 0.85185185 0.75 0.875 0.8
0.8125 0.83333333 0.91666667 0.75862069]
mean value: 0.8055289614677756
key: train_precision
value: [0.81481481 0.83018868 0.80866426 0.80442804 0.80797101 0.81338028
0.80586081 0.82181818 0.81617647 0.81949458]
mean value: 0.8142797137556002
key: test_recall
value: [0.63636364 0.87878788 0.6969697 0.72727273 0.84848485 0.72727273
0.8125 0.78125 0.6875 0.6875 ]
mean value: 0.7483901515151515
key: train_recall
value: [0.75085324 0.75085324 0.76450512 0.7440273 0.76109215 0.7883959
0.74829932 0.76870748 0.75510204 0.77210884]
mean value: 0.7603944649532167
key: test_roc_auc
value: [0.71212121 0.75757576 0.78598485 0.73863636 0.86174242 0.76988636
0.81534091 0.81486742 0.81344697 0.73768939]
mean value: 0.7807291666666667
key: train_roc_auc
value: [0.79010239 0.79863481 0.79211651 0.7818776 0.79041002 0.8040619
0.78370597 0.80073599 0.79222679 0.80073019]
mean value: 0.7934602168512456
key: test_jcc
value: [0.525 0.64444444 0.62162162 0.58536585 0.75675676 0.61538462
0.68421053 0.67567568 0.64705882 0.56410256]
mean value: 0.6319620881489416
key: train_jcc
value: [0.64139942 0.65088757 0.64739884 0.6300578 0.64450867 0.66763006
0.63400576 0.65889213 0.64534884 0.65988372]
mean value: 0.6480012816704841
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09386945 0.07413387 0.24434423 0.08141041 0.09000111 0.08292508
0.08711767 0.08184457 0.0863924 0.10627866]
mean value: 0.10283174514770507
key: score_time
value: [0.01123762 0.01102638 0.01161551 0.0113163 0.01165724 0.01128268
0.01213384 0.0116303 0.01125193 0.01118922]
mean value: 0.011434102058410644
key: test_mcc
value: [1. 0.88531564 0.96966868 0.91144345 0.96966868 0.96966868
1. 0.94028478 1. 1. ]
mean value: 0.9646049921753612
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.98461538 0.95384615 0.98461538 0.98461538
1. 0.96923077 1. 1. ]
mean value: 0.9816317016317017
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.98507463 0.95652174 0.98507463 0.98507463
1. 0.96969697 1. 1. ]
mean value: 0.9824299732281562
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.97058824 0.91666667 0.97058824 0.97058824
1. 0.94117647 1. 1. ]
mean value: 0.9661499735029146
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.984375 0.953125 0.984375 0.984375
1. 0.96969697 1. 1. ]
mean value: 0.9815340909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.97058824 0.91666667 0.97058824 0.97058824
1. 0.94117647 1. 1. ]
mean value: 0.9661499735029146
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04903173 0.05711269 0.04475141 0.0757432 0.0602169 0.08182859
0.04725313 0.0788002 0.06227136 0.04979157]
mean value: 0.060680079460144046
key: score_time
value: [0.01930285 0.01229572 0.01416039 0.01229095 0.01947045 0.01236129
0.01226354 0.01940012 0.01230478 0.01904869]
mean value: 0.015289878845214844
key: test_mcc
value: [0.9701425 0.74420841 0.87689394 0.96969697 0.96966868 0.87844611
0.94028478 0.88382395 0.87867338 1. ]
mean value: 0.9111838725828962
key: train_mcc
value: [0.95230718 0.96246294 0.96257212 0.94550795 0.95232236 0.94903173
0.93869211 0.95571215 0.95238704 0.94894121]
mean value: 0.9519936782040691
key: test_accuracy
value: [0.98484848 0.86363636 0.93846154 0.98461538 0.98461538 0.93846154
0.96923077 0.93846154 0.93846154 1. ]
mean value: 0.9540792540792541
key: train_accuracy
value: [0.97610922 0.98122867 0.98126065 0.97274276 0.97614991 0.97444634
0.9693356 0.97785349 0.97614991 0.97444634]
mean value: 0.9759722892476932
key: test_fscore
value: [0.98461538 0.87671233 0.93939394 0.98461538 0.98507463 0.94117647
0.96969697 0.94117647 0.93939394 1. ]
mean value: 0.9561855514524884
key: train_fscore
value: [0.97627119 0.98126065 0.98132428 0.97278912 0.97619048 0.97461929
0.96949153 0.97792869 0.97635135 0.97461929]
mean value: 0.9760845852229671
key: test_precision
value: [1. 0.8 0.93939394 1. 0.97058824 0.91428571
0.94117647 0.88888889 0.91176471 1. ]
mean value: 0.9366097954333248
key: train_precision
value: [0.96969697 0.97959184 0.97635135 0.96949153 0.97288136 0.96644295
0.96621622 0.97627119 0.96979866 0.96969697]
mean value: 0.9716439022231066
key: test_recall
value: [0.96969697 0.96969697 0.93939394 0.96969697 1. 0.96969697
1. 1. 0.96875 1. ]
mean value: 0.9786931818181819
key: train_recall
value: [0.98293515 0.98293515 0.98634812 0.97610922 0.97952218 0.98293515
0.97278912 0.97959184 0.9829932 0.97959184]
mean value: 0.9805750969329712
key: test_roc_auc
value: [0.98484848 0.86363636 0.93844697 0.98484848 0.984375 0.93797348
0.96969697 0.93939394 0.93892045 1. ]
mean value: 0.9542140151515152
key: train_roc_auc
value: [0.97610922 0.98122867 0.9812693 0.97274849 0.97615565 0.97446077
0.96932971 0.97785053 0.97613824 0.97443756]
mean value: 0.9759728123331244
key: test_jcc
value: [0.96969697 0.7804878 0.88571429 0.96969697 0.97058824 0.88888889
0.94117647 0.88888889 0.88571429 1. ]
mean value: 0.918085279936069
key: train_jcc
value: [0.95364238 0.9632107 0.96333333 0.94701987 0.95348837 0.95049505
0.94078947 0.95681063 0.95379538 0.95049505]
mean value: 0.9533080242884424
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01191735 0.01109195 0.01145744 0.01019454 0.01044583 0.01129651
0.01036239 0.0100286 0.01021838 0.01007605]
mean value: 0.010708904266357422
key: score_time
value: [0.00950074 0.00956059 0.00898194 0.00879025 0.00971723 0.00899887
0.00876164 0.00882864 0.00875211 0.00889969]
mean value: 0.009079170227050782
key: test_mcc
value: [0.42443734 0.64715023 0.57061637 0.53882576 0.78763191 0.48131798
0.65199287 0.54591405 0.60000027 0.48131798]
mean value: 0.5729204754535123
key: train_mcc
value: [0.62197601 0.58428511 0.61082676 0.57563129 0.58929329 0.59184703
0.59480594 0.61245804 0.58185441 0.61325931]
mean value: 0.5976237181144685
key: test_accuracy
value: [0.71212121 0.81818182 0.78461538 0.76923077 0.89230769 0.73846154
0.81538462 0.76923077 0.8 0.73846154]
mean value: 0.7837995337995338
key: train_accuracy
value: [0.8105802 0.79180887 0.80408859 0.78705281 0.79386712 0.7955707
0.79727428 0.80579216 0.79045997 0.80579216]
mean value: 0.7982286863847526
key: test_fscore
value: [0.71641791 0.83333333 0.78125 0.76923077 0.89855072 0.76056338
0.83333333 0.7826087 0.79365079 0.71186441]
mean value: 0.7880803347347197
key: train_fscore
value: [0.81530782 0.79666667 0.81239804 0.79406919 0.80065898 0.8
0.80067002 0.81125828 0.79669421 0.81311475]
mean value: 0.8040837964585462
key: test_precision
value: [0.70588235 0.76923077 0.80645161 0.78125 0.86111111 0.71052632
0.75 0.72972973 0.80645161 0.77777778]
mean value: 0.7698411282386489
key: train_precision
value: [0.79545455 0.77850163 0.778125 0.76751592 0.77388535 0.78175896
0.78877888 0.79032258 0.77491961 0.78481013]
mean value: 0.7814072604922253
key: test_recall
value: [0.72727273 0.90909091 0.75757576 0.75757576 0.93939394 0.81818182
0.9375 0.84375 0.78125 0.65625 ]
mean value: 0.8127840909090909
key: train_recall
value: [0.83617747 0.81569966 0.84982935 0.8225256 0.82935154 0.81911263
0.81292517 0.83333333 0.81972789 0.84353741]
mean value: 0.8282220055257599
key: test_roc_auc
value: [0.71212121 0.81818182 0.78503788 0.76941288 0.89157197 0.73721591
0.81723485 0.77035985 0.79971591 0.73721591]
mean value: 0.7838068181818182
key: train_roc_auc
value: [0.8105802 0.79180887 0.80416638 0.78711314 0.79392747 0.79561074
0.79724757 0.80574516 0.79041002 0.80572775]
mean value: 0.7982337303522091
key: test_jcc
value: [0.55813953 0.71428571 0.64102564 0.625 0.81578947 0.61363636
0.71428571 0.64285714 0.65789474 0.55263158]
mean value: 0.6535545900447981
key: train_jcc
value: [0.68820225 0.66204986 0.68406593 0.65846995 0.66758242 0.66666667
0.66759777 0.68245125 0.66208791 0.68508287]
mean value: 0.6724256876218178
MCC on Blind test: 0.74
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02177429 0.02907991 0.03356957 0.03469872 0.02976513 0.02188373
0.03032804 0.02910614 0.02598953 0.03564668]
mean value: 0.029184174537658692
key: score_time
value: [0.01015782 0.01128125 0.01198101 0.01198912 0.01193428 0.01202154
0.01189089 0.01198149 0.01193619 0.01201606]
mean value: 0.011718964576721192
key: test_mcc
value: [0.90950859 0.85201287 0.90805728 0.87844611 0.91144345 0.74121539
0.94017476 0.84953768 0.91144345 0.96969697]
mean value: 0.8871536549688918
key: train_mcc
value: [0.96928892 0.96933409 0.96938669 0.96592835 0.9326412 0.85652373
0.94234858 0.92610248 0.92910679 0.95571215]
mean value: 0.9416372979073604
key: test_accuracy
value: [0.95454545 0.92424242 0.95384615 0.93846154 0.95384615 0.86153846
0.96923077 0.92307692 0.95384615 0.98461538]
mean value: 0.9417249417249418
key: train_accuracy
value: [0.98464164 0.98464164 0.9846678 0.98296422 0.96592845 0.92504259
0.97103918 0.96252129 0.96422487 0.97785349]
mean value: 0.9703525184457327
key: test_fscore
value: [0.95522388 0.92753623 0.95522388 0.94117647 0.95652174 0.84745763
0.96774194 0.91803279 0.95081967 0.98461538]
mean value: 0.9404349609031051
key: train_fscore
value: [0.98461538 0.98471986 0.98471986 0.98293515 0.96655518 0.92
0.9707401 0.96167247 0.96360485 0.97792869]
mean value: 0.969749157302225
key: test_precision
value: [0.94117647 0.88888889 0.94117647 0.91428571 0.91666667 0.96153846
1. 0.96551724 1. 0.96969697]
mean value: 0.9498946883632482
key: train_precision
value: [0.98630137 0.97972973 0.97972973 0.98293515 0.94754098 0.9844358
0.9825784 0.98571429 0.98233216 0.97627119]
mean value: 0.9787568789022557
key: test_recall
value: [0.96969697 0.96969697 0.96969697 0.96969697 1. 0.75757576
0.9375 0.875 0.90625 1. ]
mean value: 0.9355113636363637
key: train_recall
value: [0.98293515 0.98976109 0.98976109 0.98293515 0.98634812 0.86348123
0.95918367 0.93877551 0.94557823 0.97959184]
mean value: 0.9618351094704093
key: test_roc_auc
value: [0.95454545 0.92424242 0.95359848 0.93797348 0.953125 0.86316288
0.96875 0.92234848 0.953125 0.98484848]
mean value: 0.9415719696969698
key: train_roc_auc
value: [0.98464164 0.98464164 0.98467646 0.98296418 0.96596318 0.92493789
0.97105941 0.96256182 0.96425669 0.97785053]
mean value: 0.9703553435025888
key: test_jcc
value: [0.91428571 0.86486486 0.91428571 0.88888889 0.91666667 0.73529412
0.9375 0.84848485 0.90625 0.96969697]
mean value: 0.8896217784820726
key: train_jcc
value: [0.96969697 0.96989967 0.96989967 0.96644295 0.93527508 0.85185185
0.94314381 0.9261745 0.92976589 0.95681063]
mean value: 0.9418961013448971
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01870584 0.02071786 0.02087641 0.02029848 0.01994467 0.02535534
0.02135015 0.02066016 0.02104807 0.02397323]
mean value: 0.021293020248413085
key: score_time
value: [0.01050353 0.01194453 0.01200318 0.01192403 0.01192117 0.0119741
0.01186943 0.01193142 0.01191711 0.01191044]
mean value: 0.011789894104003907
key: test_mcc
value: [0.90950859 0.78824078 0.84953768 0.8291562 0.91144345 0.63932742
0.94028478 0.85663571 0.87844611 1. ]
mean value: 0.8602580723623745
key: train_mcc
value: [0.96928892 0.96589281 0.8665154 0.82731229 0.9293669 0.59520993
0.9524971 0.93921285 0.95238925 0.95911402]
mean value: 0.8956799458569592
key: test_accuracy
value: [0.95454545 0.89393939 0.92307692 0.90769231 0.95384615 0.8
0.96923077 0.92307692 0.93846154 1. ]
mean value: 0.9263869463869464
key: train_accuracy
value: [0.98464164 0.98293515 0.93015332 0.90630324 0.96422487 0.7649063
0.97614991 0.9693356 0.97614991 0.97955707]
mean value: 0.9434357030309726
key: test_fscore
value: [0.95522388 0.89552239 0.92753623 0.91666667 0.95652174 0.76363636
0.96969697 0.92753623 0.93548387 1. ]
mean value: 0.9247824342523009
key: train_fscore
value: [0.98461538 0.98287671 0.93397746 0.91419657 0.96494157 0.69469027
0.97643098 0.96989967 0.9760274 0.97959184]
mean value: 0.9377247831270099
key: test_precision
value: [0.94117647 0.88235294 0.88888889 0.84615385 0.91666667 0.95454545
0.94117647 0.86486486 0.96666667 1. ]
mean value: 0.9202492270139329
key: train_precision
value: [0.98630137 0.9862543 0.88414634 0.84195402 0.94444444 0.98742138
0.96666667 0.95394737 0.98275862 0.97959184]
mean value: 0.9513486350451892
key: test_recall
value: [0.96969697 0.90909091 0.96969697 1. 1. 0.63636364
1. 1. 0.90625 1. ]
mean value: 0.9391098484848485
key: train_recall
value: [0.98293515 0.97952218 0.98976109 1. 0.98634812 0.53583618
0.98639456 0.98639456 0.96938776 0.97959184]
mean value: 0.9396171437858419
key: test_roc_auc
value: [0.95454545 0.89393939 0.92234848 0.90625 0.953125 0.80255682
0.96969697 0.92424242 0.93797348 1. ]
mean value: 0.9264678030303031
key: train_roc_auc
value: [0.98464164 0.98293515 0.9302547 0.90646259 0.9642625 0.76451673
0.97613243 0.96930649 0.97616145 0.97955701]
mean value: 0.9434230688862576
key: test_jcc
value: [0.91428571 0.81081081 0.86486486 0.84615385 0.91666667 0.61764706
0.94117647 0.86486486 0.87878788 1. ]
mean value: 0.8655258175846411
key: train_jcc
value: [0.96969697 0.96632997 0.87613293 0.84195402 0.93225806 0.53220339
0.95394737 0.94155844 0.95317726 0.96 ]
mean value: 0.8927258411380252
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.19168162 0.18652463 0.18877339 0.18711591 0.18756223 0.18694925
0.18872786 0.18731856 0.1890502 0.18967414]
mean value: 0.18833777904510499
key: score_time
value: [0.01539207 0.01641464 0.01546192 0.01561141 0.01558447 0.01564121
0.01585484 0.0155468 0.01538491 0.01601601]
mean value: 0.015690827369689943
key: test_mcc
value: [1. 0.88531564 0.96966868 0.94017476 0.96966868 0.96966868
1. 0.96969697 1. 0.96969697]
mean value: 0.9673890382089856
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.98461538 0.96923077 0.98461538 0.98461538
1. 0.98461538 1. 0.98461538]
mean value: 0.9831701631701633
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.98507463 0.97058824 0.98507463 0.98507463
1. 0.98461538 1. 0.98461538]
mean value: 0.9837900027979045
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.97058824
1. 0.96969697 1. 0.96969697]
mean value: 0.9685907680025327
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.984375 0.96875 0.984375 0.984375
1. 0.98484848 1. 0.98484848]
mean value: 0.983096590909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.97058824
1. 0.96969697 1. 0.96969697]
mean value: 0.9685907680025327
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.07231236 0.07584405 0.07159591 0.09159732 0.08213663 0.09455013
0.08881402 0.07496858 0.08687353 0.09348845]
mean value: 0.08321809768676758
key: score_time
value: [0.02371931 0.03964543 0.03809118 0.02287412 0.0395174 0.03245187
0.02435994 0.03845763 0.03063273 0.03672957]
mean value: 0.03264791965484619
key: test_mcc
value: [1. 0.88531564 0.96966868 0.94017476 0.96966868 0.94017476
1. 0.94028478 1. 0.96969697]
mean value: 0.9614984268728954
key: train_mcc
value: [1. 0.9931972 0.98983039 0.99659864 1. 0.99659864
0.9965986 0.9965986 0.98983004 0.99320865]
mean value: 0.9952460760299192
key: test_accuracy
value: [1. 0.93939394 0.98461538 0.96923077 0.98461538 0.96923077
1. 0.96923077 1. 0.98461538]
mean value: 0.9800932400932401
key: train_accuracy
value: [1. 0.99658703 0.99488927 0.99829642 1. 0.99829642
0.99829642 0.99829642 0.99488927 0.99659284]
mean value: 0.9976144100563401
key: test_fscore
value: [1. 0.94285714 0.98507463 0.97058824 0.98507463 0.97058824
1. 0.96969697 1. 0.98461538]
mean value: 0.9808495221489075
key: train_fscore
value: [1. 0.99659864 0.99490662 0.99829642 1. 0.99829642
0.99830221 0.99830221 0.99492386 0.99661017]
mean value: 0.9976236547443424
key: test_precision
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.94285714
1. 0.94117647 1. 0.96969697]
mean value: 0.9629656088479618
key: train_precision
value: [1. 0.99322034 0.98986486 0.99659864 1. 0.99659864
0.99661017 0.99661017 0.98989899 0.99324324]
mean value: 0.9952645054884764
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.984375 0.96875 0.984375 0.96875
1. 0.96969697 1. 0.98484848]
mean value: 0.9800189393939394
key: train_roc_auc
value: [1. 0.99658703 0.99489796 0.99829932 1. 0.99829932
0.99829352 0.99829352 0.99488055 0.99658703]
mean value: 0.9976138236864711
key: test_jcc
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.94285714
1. 0.94117647 1. 0.96969697]
mean value: 0.9629656088479618
key: train_jcc
value: [1. 0.99322034 0.98986486 0.99659864 1. 0.99659864
0.99661017 0.99661017 0.98989899 0.99324324]
mean value: 0.9952645054884764
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.22352338 0.25992131 0.29241633 0.16742349 0.21570444 0.24491692
0.23902869 0.24685073 0.24089813 0.31559348]
mean value: 0.2446276903152466
key: score_time
value: [0.0278883 0.02800488 0.02785397 0.01648283 0.02850795 0.02785277
0.02769923 0.02845788 0.027807 0.02868414]
mean value: 0.02692389488220215
key: test_mcc
value: [0.88531564 0.88531564 0.80282704 0.91144345 0.77695466 0.80282704
0.80403025 0.91168461 0.91168461 0.81706198]
mean value: 0.8509144915529278
key: train_mcc
value: [0.98981298 0.98981298 0.98646327 0.98646327 0.99320881 0.98983039
0.98310636 0.98646265 0.98983004 0.97957952]
mean value: 0.9874570253062251
key: test_accuracy
value: [0.93939394 0.93939394 0.89230769 0.95384615 0.87692308 0.89230769
0.89230769 0.95384615 0.95384615 0.90769231]
mean value: 0.9201864801864802
key: train_accuracy
value: [0.99488055 0.99488055 0.99318569 0.99318569 0.99659284 0.99488927
0.99148211 0.99318569 0.99488927 0.98977853]
mean value: 0.9936950189254089
key: test_fscore
value: [0.94285714 0.94285714 0.90410959 0.95652174 0.89189189 0.90410959
0.90140845 0.95522388 0.95522388 0.90909091]
mean value: 0.9263294215807969
key: train_fscore
value: [0.99490662 0.99490662 0.99322034 0.99322034 0.99659864 0.99490662
0.9915683 0.99324324 0.99492386 0.98983051]
mean value: 0.9937325087980247
key: test_precision
value: [0.89189189 0.89189189 0.825 0.91666667 0.80487805 0.825
0.82051282 0.91428571 0.91428571 0.88235294]
mean value: 0.8686765689491658
key: train_precision
value: [0.98986486 0.98986486 0.98653199 0.98653199 0.99322034 0.98986486
0.98327759 0.98657718 0.98989899 0.98648649]
mean value: 0.9882119156208393
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.9375]
mean value: 0.99375
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 1. 0.99319728]
mean value: 0.9993197278911564
key: test_roc_auc
value: [0.93939394 0.93939394 0.890625 0.953125 0.875 0.890625
0.89393939 0.95454545 0.95454545 0.90814394]
mean value: 0.9199337121212121
key: train_roc_auc
value: [0.99488055 0.99488055 0.99319728 0.99319728 0.99659864 0.99489796
0.99146758 0.99317406 0.99488055 0.9897727 ]
mean value: 0.9936947133802326
key: test_jcc
value: [0.89189189 0.89189189 0.825 0.91666667 0.80487805 0.825
0.82051282 0.91428571 0.91428571 0.83333333]
mean value: 0.8637746081648521
key: train_jcc
value: [0.98986486 0.98986486 0.98653199 0.98653199 0.99322034 0.98986486
0.98327759 0.98657718 0.98989899 0.97986577]
mean value: 0.9875498441533987
MCC on Blind test: 0.64
Accuracy on Blind test: 0.88
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.74850655 0.7487638 0.75244284 0.75276852 0.75158358 0.74764895
0.75177717 0.75058126 0.74825907 0.75402451]
mean value: 0.7506356239318848
key: score_time
value: [0.00954652 0.01034856 0.00933361 0.00950527 0.01000786 0.00935388
0.00940204 0.0093143 0.0098443 0.00949383]
mean value: 0.009615015983581544
key: test_mcc
value: [1. 0.88531564 0.96966868 0.94017476 0.96966868 0.91144345
1. 0.94028478 1. 0.96969697]
mean value: 0.9586252965196415
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93939394 0.98461538 0.96923077 0.98461538 0.95384615
1. 0.96923077 1. 0.98461538]
mean value: 0.9785547785547786
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94285714 0.98507463 0.97058824 0.98507463 0.95652174
1. 0.96969697 1. 0.98461538]
mean value: 0.9794428725325393
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.91666667
1. 0.94117647 1. 0.96969697]
mean value: 0.9603465612289142
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93939394 0.984375 0.96875 0.984375 0.953125
1. 0.96969697 1. 0.98484848]
mean value: 0.9784564393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89189189 0.97058824 0.94285714 0.97058824 0.91666667
1. 0.94117647 1. 0.96969697]
mean value: 0.9603465612289142
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03289318 0.03336477 0.03610873 0.03162313 0.0395174 0.03225017
0.03240514 0.03235316 0.03207421 0.03266597]
mean value: 0.03352558612823486
key: score_time
value: [0.01233625 0.01858902 0.01290846 0.01493025 0.01855946 0.02041245
0.01499772 0.01503062 0.01514864 0.0151844 ]
mean value: 0.01580972671508789
key: test_mcc
value: [0.94112395 1. 0.96966868 0.94028478 0.96969697 1.
0.96969697 0.90805728 0.87844611 1. ]
mean value: 0.9576974742315519
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 1. 0.98461538 0.96923077 0.98461538 1.
0.98461538 0.95384615 0.93846154 1. ]
mean value: 0.9785081585081585
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 1. 0.98507463 0.96875 0.98461538 1.
0.98461538 0.95238095 0.93548387 1. ]
mean value: 0.9781508454739253
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 1. 0.97058824 1. 1. 1.
0.96969697 0.96774194 0.96666667 1. ]
mean value: 0.9817550949998768
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.93939394 0.96969697 1.
1. 0.9375 0.90625 1. ]
mean value: 0.975284090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 1. 0.984375 0.96969697 0.98484848 1.
0.98484848 0.95359848 0.93797348 1. ]
mean value: 0.9785037878787879
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 1. 0.97058824 0.93939394 0.96969697 1.
0.96969697 0.90909091 0.87878788 1. ]
mean value: 0.9580112044817928
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.79
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01766801 0.01648593 0.03942156 0.03964114 0.0395329 0.03955841
0.03969049 0.03983688 0.04054785 0.0395534 ]
mean value: 0.035193657875061034
key: score_time
value: [0.02938604 0.01217794 0.01870894 0.0188036 0.01890564 0.0188725
0.01900768 0.01918125 0.01913404 0.01896834]
mean value: 0.01931459903717041
key: test_mcc
value: [0.93939394 0.79708114 0.84644588 0.93844697 0.94017476 0.93844697
0.94028478 0.91168461 0.93844697 1. ]
mean value: 0.9190406017750745
key: train_mcc
value: [0.94541452 0.95563697 0.95920405 0.94550795 0.942084 0.94212842
0.94208333 0.94889833 0.94550668 0.93869211]
mean value: 0.946515635707813
key: test_accuracy
value: [0.96969697 0.89393939 0.92307692 0.96923077 0.96923077 0.96923077
0.96923077 0.95384615 0.96923077 1. ]
mean value: 0.9586713286713286
key: train_accuracy
value: [0.97269625 0.9778157 0.97955707 0.97274276 0.97103918 0.97103918
0.97103918 0.97444634 0.97274276 0.9693356 ]
mean value: 0.9732454023757057
key: test_fscore
value: [0.96969697 0.90140845 0.92537313 0.96969697 0.97058824 0.96969697
0.96969697 0.95522388 0.96875 1. ]
mean value: 0.9600131579711595
key: train_fscore
value: [0.97278912 0.97777778 0.97966102 0.97278912 0.97103918 0.97113752
0.97113752 0.97444634 0.97288136 0.96949153]
mean value: 0.9733150469411342
key: test_precision
value: [0.96969697 0.84210526 0.91176471 0.96969697 0.94285714 0.96969697
0.94117647 0.91428571 0.96875 1. ]
mean value: 0.9430030205862249
key: train_precision
value: [0.96949153 0.97945205 0.97306397 0.96949153 0.96938776 0.96621622
0.96949153 0.97610922 0.96959459 0.96621622]
mean value: 0.9708514601275813
key: test_recall
value: [0.96969697 0.96969697 0.93939394 0.96969697 1. 0.96969697
1. 1. 0.96875 1. ]
mean value: 0.9786931818181819
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97610922 0.97610922 0.98634812 0.97610922 0.97269625 0.97610922
0.97278912 0.97278912 0.97619048 0.97278912]
mean value: 0.9758039051798193
key: test_roc_auc
value: [0.96969697 0.89393939 0.92282197 0.96922348 0.96875 0.96922348
0.96969697 0.95454545 0.96922348 1. ]
mean value: 0.9587121212121212
key: train_roc_auc
value: [0.97269625 0.9778157 0.97956862 0.97274849 0.971042 0.9710478
0.9710362 0.97444917 0.97273688 0.96932971]
mean value: 0.9732470804021267
key: test_jcc
value: [0.94117647 0.82051282 0.86111111 0.94117647 0.94285714 0.94117647
0.94117647 0.91428571 0.93939394 1. ]
mean value: 0.9242866610513669
key: train_jcc
value: [0.94701987 0.95652174 0.96013289 0.94701987 0.94370861 0.94389439
0.94389439 0.95016611 0.94719472 0.94078947]
mean value: 0.94803420588576
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27860427 0.30625582 0.2999301 0.29773641 0.29638553 0.30672407
0.34239507 0.32195616 0.29284167 0.299088 ]
mean value: 0.3041917085647583
key: score_time
value: [0.01983547 0.01884723 0.01885319 0.0189271 0.01889658 0.01881385
0.02004528 0.01894379 0.01895666 0.0188508 ]
mean value: 0.019096994400024415
key: test_mcc
value: [0.93939394 0.79708114 0.84644588 0.93844697 0.94017476 0.90805728
0.94028478 0.91168461 0.93844697 1. ]
mean value: 0.916001632801491
key: train_mcc
value: [0.94541452 0.95563697 0.95920405 0.94550795 0.942084 0.94212842
0.94208333 0.94889833 0.94550668 0.93869211]
mean value: 0.946515635707813
key: test_accuracy
value: [0.96969697 0.89393939 0.92307692 0.96923077 0.96923077 0.95384615
0.96923077 0.95384615 0.96923077 1. ]
mean value: 0.9571328671328672
key: train_accuracy
value: [0.97269625 0.9778157 0.97955707 0.97274276 0.97103918 0.97103918
0.97103918 0.97444634 0.97274276 0.9693356 ]
mean value: 0.9732454023757057
key: test_fscore
value: [0.96969697 0.90140845 0.92537313 0.96969697 0.97058824 0.95522388
0.96969697 0.95522388 0.96875 1. ]
mean value: 0.958565849061164
key: train_fscore
value: [0.97278912 0.97777778 0.97966102 0.97278912 0.97103918 0.97113752
0.97113752 0.97444634 0.97288136 0.96949153]
mean value: 0.9733150469411342
key: test_precision
value: [0.96969697 0.84210526 0.91176471 0.96969697 0.94285714 0.94117647
0.94117647 0.91428571 0.96875 1. ]
mean value: 0.9401509706753515
key: train_precision
value: [0.96949153 0.97945205 0.97306397 0.96949153 0.96938776 0.96621622
0.96949153 0.97610922 0.96959459 0.96621622]
mean value: 0.9708514601275813
key: test_recall
value: [0.96969697 0.96969697 0.93939394 0.96969697 1. 0.96969697
1. 1. 0.96875 1. ]
mean value: 0.9786931818181819
key: train_recall
value: [0.97610922 0.97610922 0.98634812 0.97610922 0.97269625 0.97610922
0.97278912 0.97278912 0.97619048 0.97278912]
mean value: 0.9758039051798193
key: test_roc_auc
value: [0.96969697 0.89393939 0.92282197 0.96922348 0.96875 0.95359848
0.96969697 0.95454545 0.96922348 1. ]
mean value: 0.9571496212121212
key: train_roc_auc
value: [0.97269625 0.9778157 0.97956862 0.97274849 0.971042 0.9710478
0.9710362 0.97444917 0.97273688 0.96932971]
mean value: 0.9732470804021267
key: test_jcc
value: [0.94117647 0.82051282 0.86111111 0.94117647 0.94285714 0.91428571
0.94117647 0.91428571 0.93939394 1. ]
mean value: 0.9215975854211148
key: train_jcc
value: [0.94701987 0.95652174 0.96013289 0.94701987 0.94370861 0.94389439
0.94389439 0.95016611 0.94719472 0.94078947]
mean value: 0.94803420588576
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0257926 0.02790737 0.02318335 0.03167439 0.05734181 0.04880977
0.03250027 0.03194118 0.03827858 0.0335741 ]
mean value: 0.035100340843200684
key: score_time
value: [0.01167178 0.01159739 0.01160884 0.01431298 0.01217198 0.01262093
0.01354647 0.01503205 0.01518011 0.01170874]
mean value: 0.012945127487182618
key: test_mcc
value: [0.56980288 0.67082039 0.77777778 0.4472136 0.3721042 0.89442719
0.65277778 0.41666667 0.6479516 0.41666667]
mean value: 0.5866208749996378
key: train_mcc
value: [0.88614695 0.83550998 0.84837318 0.86082846 0.88614695 0.86138081
0.88859066 0.86227649 0.84929565 0.8868492 ]
mean value: 0.8665398312973626
key: test_accuracy
value: [0.77777778 0.83333333 0.88888889 0.72222222 0.66666667 0.94444444
0.82352941 0.70588235 0.82352941 0.70588235]
mean value: 0.7892156862745098
key: train_accuracy
value: [0.94303797 0.91772152 0.92405063 0.93037975 0.94303797 0.93037975
0.94339623 0.93081761 0.9245283 0.94339623]
mean value: 0.9330745959716583
key: test_fscore
value: [0.8 0.84210526 0.88888889 0.73684211 0.57142857 0.94117647
0.82352941 0.70588235 0.84210526 0.70588235]
mean value: 0.7857840680131701
key: train_fscore
value: [0.94267516 0.91719745 0.92307692 0.93081761 0.94267516 0.92903226
0.94193548 0.92993631 0.92307692 0.94267516]
mean value: 0.9323098433821013
key: test_precision
value: [0.72727273 0.8 0.88888889 0.7 0.8 1.
0.77777778 0.66666667 0.8 0.75 ]
mean value: 0.791060606060606
key: train_precision
value: [0.94871795 0.92307692 0.93506494 0.925 0.94871795 0.94736842
0.97333333 0.94805195 0.93506494 0.94871795]
mean value: 0.9433114341798552
key: test_recall
value: [0.88888889 0.88888889 0.88888889 0.77777778 0.44444444 0.88888889
0.875 0.75 0.88888889 0.66666667]
mean value: 0.7958333333333333
key: train_recall
value: [0.93670886 0.91139241 0.91139241 0.93670886 0.93670886 0.91139241
0.9125 0.9125 0.91139241 0.93670886]
mean value: 0.9217405063291139
key: test_roc_auc
value: [0.77777778 0.83333333 0.88888889 0.72222222 0.66666667 0.94444444
0.82638889 0.70833333 0.81944444 0.70833333]
mean value: 0.7895833333333333
key: train_roc_auc
value: [0.94303797 0.91772152 0.92405063 0.93037975 0.94303797 0.93037975
0.94359177 0.93093354 0.9244462 0.94335443]
mean value: 0.9330933544303798
key: test_jcc
value: [0.66666667 0.72727273 0.8 0.58333333 0.4 0.88888889
0.7 0.54545455 0.72727273 0.54545455]
mean value: 0.6584343434343434
key: train_jcc
value: [0.89156627 0.84705882 0.85714286 0.87058824 0.89156627 0.86746988
0.8902439 0.86904762 0.85714286 0.89156627]
mean value: 0.8733392969294682
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.79543591 0.73526001 0.70240498 0.83575439 0.69058466 0.73388934
0.82162809 0.70372939 0.68670249 0.91151547]
mean value: 0.7616904735565185
key: score_time
value: [0.01407862 0.01623702 0.01558208 0.01444364 0.01650882 0.01581383
0.01443124 0.01586819 0.01206326 0.01457977]
mean value: 0.014960646629333496
key: test_mcc
value: [0.67082039 0.77777778 0.67082039 0.4472136 0.3721042 0.67082039
0.65277778 0.54935027 0.52777778 0.65277778]
mean value: 0.5992240355702041
key: train_mcc
value: [0.98742088 0.9243469 0.91146543 0.92405063 0.9621024 0.97468354
0.93718354 0.96234177 0.77385663 0.96233582]
mean value: 0.9319787560885171
key: test_accuracy
value: [0.83333333 0.88888889 0.83333333 0.72222222 0.66666667 0.83333333
0.82352941 0.76470588 0.76470588 0.82352941]
mean value: 0.7954248366013071
key: train_accuracy
value: [0.99367089 0.96202532 0.9556962 0.96202532 0.98101266 0.98734177
0.96855346 0.98113208 0.88679245 0.98113208]
mean value: 0.9659382214791816
key: test_fscore
value: [0.84210526 0.88888889 0.84210526 0.73684211 0.57142857 0.84210526
0.82352941 0.77777778 0.77777778 0.82352941]
mean value: 0.792608973413927
key: train_fscore
value: [0.99363057 0.96153846 0.95541401 0.96202532 0.98113208 0.98734177
0.96855346 0.98113208 0.8875 0.98089172]
mean value: 0.9659159465941434
key: test_precision
value: [0.8 0.88888889 0.8 0.7 0.8 0.8
0.77777778 0.7 0.77777778 0.875 ]
mean value: 0.7919444444444445
key: train_precision
value: [1. 0.97402597 0.96153846 0.96202532 0.975 0.98734177
0.97468354 0.98734177 0.87654321 0.98717949]
mean value: 0.9685679537683757
key: test_recall
value: [0.88888889 0.88888889 0.88888889 0.77777778 0.44444444 0.88888889
0.875 0.875 0.77777778 0.77777778]
mean value: 0.8083333333333333
key: train_recall
value: [0.98734177 0.94936709 0.94936709 0.96202532 0.98734177 0.98734177
0.9625 0.975 0.89873418 0.97468354]
mean value: 0.9633702531645569
key: test_roc_auc
value: [0.83333333 0.88888889 0.83333333 0.72222222 0.66666667 0.83333333
0.82638889 0.77083333 0.76388889 0.82638889]
mean value: 0.7965277777777778
key: train_roc_auc
value: [0.99367089 0.96202532 0.9556962 0.96202532 0.98101266 0.98734177
0.96859177 0.98117089 0.88686709 0.98109177]
mean value: 0.9659493670886076
key: test_jcc
value: [0.72727273 0.8 0.72727273 0.58333333 0.4 0.72727273
0.7 0.63636364 0.63636364 0.7 ]
mean value: 0.6637878787878788
key: train_jcc
value: [0.98734177 0.92592593 0.91463415 0.92682927 0.96296296 0.975
0.93902439 0.96296296 0.79775281 0.9625 ]
mean value: 0.9354934237870564
MCC on Blind test: 0.69
Accuracy on Blind test: 0.85
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01273727 0.00978827 0.00896478 0.0088222 0.00867915 0.00854111
0.00854087 0.00868607 0.00857282 0.00857306]
mean value: 0.009190559387207031
key: score_time
value: [0.01181936 0.00904059 0.00885773 0.0085566 0.00860143 0.00856304
0.00852323 0.00849891 0.00854683 0.00855064]
mean value: 0.008955836296081543
key: test_mcc
value: [ 0.56980288 0.55555556 0.47140452 0.4472136 0.11111111 0.77777778
0.42600643 0.18055556 0.6479516 -0.18055556]
mean value: 0.4006823471940616
key: train_mcc
value: [0.65955018 0.67174771 0.54433105 0.70937243 0.50155039 0.68359907
0.69017683 0.63906026 0.67502174 0.68608524]
mean value: 0.6460494891843794
key: test_accuracy
value: [0.77777778 0.77777778 0.72222222 0.72222222 0.55555556 0.88888889
0.70588235 0.58823529 0.82352941 0.41176471]
mean value: 0.6973856209150326
key: train_accuracy
value: [0.82911392 0.83544304 0.75316456 0.85443038 0.7278481 0.84177215
0.8427673 0.81761006 0.83647799 0.8427673 ]
mean value: 0.8181394793408169
key: test_fscore
value: [0.8 0.77777778 0.76190476 0.70588235 0.55555556 0.88888889
0.61538462 0.58823529 0.84210526 0.44444444]
mean value: 0.6980178954172762
key: train_fscore
value: [0.83435583 0.83950617 0.79144385 0.85714286 0.77486911 0.8427673
0.83443709 0.82840237 0.84146341 0.8447205 ]
mean value: 0.8289108478500907
key: test_precision
value: [0.72727273 0.77777778 0.66666667 0.75 0.55555556 0.88888889
0.8 0.55555556 0.8 0.44444444]
mean value: 0.6966161616161616
key: train_precision
value: [0.80952381 0.81927711 0.68518519 0.84146341 0.66071429 0.8375
0.88732394 0.78651685 0.81176471 0.82926829]
mean value: 0.7968537599650998
key: test_recall
value: [0.88888889 0.77777778 0.88888889 0.66666667 0.55555556 0.88888889
0.5 0.625 0.88888889 0.44444444]
mean value: 0.7125
key: train_recall
value: [0.86075949 0.86075949 0.93670886 0.87341772 0.93670886 0.84810127
0.7875 0.875 0.87341772 0.86075949]
mean value: 0.8713132911392405
key: test_roc_auc
value: [0.77777778 0.77777778 0.72222222 0.72222222 0.55555556 0.88888889
0.69444444 0.59027778 0.81944444 0.40972222]
mean value: 0.6958333333333333
key: train_roc_auc
value: [0.82911392 0.83544304 0.75316456 0.85443038 0.7278481 0.84177215
0.84311709 0.81724684 0.83670886 0.84287975]
mean value: 0.8181724683544304
key: test_jcc
value: [0.66666667 0.63636364 0.61538462 0.54545455 0.38461538 0.8
0.44444444 0.41666667 0.72727273 0.28571429]
mean value: 0.5522582972582972
key: train_jcc
value: [0.71578947 0.72340426 0.65486726 0.75 0.63247863 0.72826087
0.71590909 0.70707071 0.72631579 0.7311828 ]
mean value: 0.7085278870836784
MCC on Blind test: 0.35
Accuracy on Blind test: 0.71
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00883412 0.00897193 0.00890326 0.00889039 0.00874805 0.00873041
0.00873351 0.00878525 0.00875974 0.00875354]
mean value: 0.008811020851135254
key: score_time
value: [0.0085361 0.00855994 0.00862718 0.00873303 0.00857043 0.00848055
0.00854826 0.00856686 0.00855255 0.00853872]
mean value: 0.008571362495422364
key: test_mcc
value: [0.11111111 0.2236068 0.4472136 0.56980288 0.26726124 0.33333333
0.16903085 0.16903085 0.52777778 0.18055556]
mean value: 0.2998723997129735
key: train_mcc
value: [0.62045203 0.58609427 0.62345811 0.58344823 0.54500286 0.55700665
0.53581684 0.5346519 0.61038989 0.59809901]
mean value: 0.5794419778966067
key: test_accuracy
value: [0.55555556 0.61111111 0.72222222 0.77777778 0.61111111 0.66666667
0.58823529 0.58823529 0.76470588 0.58823529]
mean value: 0.6473856209150327
key: train_accuracy
value: [0.81012658 0.79113924 0.81012658 0.79113924 0.7721519 0.77848101
0.7672956 0.7672956 0.80503145 0.79874214]
mean value: 0.7891529336836239
key: test_fscore
value: [0.55555556 0.63157895 0.73684211 0.75 0.46153846 0.66666667
0.53333333 0.53333333 0.77777778 0.58823529]
mean value: 0.6234861474954354
key: train_fscore
value: [0.80769231 0.77852349 0.8 0.79754601 0.77777778 0.77707006
0.76129032 0.7672956 0.8 0.79220779]
mean value: 0.7859403363639892
key: test_precision
value: [0.55555556 0.6 0.7 0.85714286 0.75 0.66666667
0.57142857 0.57142857 0.77777778 0.625 ]
mean value: 0.6675
key: train_precision
value: [0.81818182 0.82857143 0.84507042 0.77380952 0.75903614 0.78205128
0.78666667 0.7721519 0.81578947 0.81333333]
mean value: 0.7994661992145965
key: test_recall
value: [0.55555556 0.66666667 0.77777778 0.66666667 0.33333333 0.66666667
0.5 0.5 0.77777778 0.55555556]
mean value: 0.6
key: train_recall
value: [0.79746835 0.73417722 0.75949367 0.82278481 0.79746835 0.7721519
0.7375 0.7625 0.78481013 0.7721519 ]
mean value: 0.7740506329113924
key: test_roc_auc
value: [0.55555556 0.61111111 0.72222222 0.77777778 0.61111111 0.66666667
0.58333333 0.58333333 0.76388889 0.59027778]
mean value: 0.6465277777777778
key: train_roc_auc
value: [0.81012658 0.79113924 0.81012658 0.79113924 0.7721519 0.77848101
0.76748418 0.76732595 0.80490506 0.79857595]
mean value: 0.7891455696202532
key: test_jcc
value: [0.38461538 0.46153846 0.58333333 0.6 0.3 0.5
0.36363636 0.36363636 0.63636364 0.41666667]
mean value: 0.460979020979021
key: train_jcc
value: [0.67741935 0.63736264 0.66666667 0.66326531 0.63636364 0.63541667
0.61458333 0.62244898 0.66666667 0.65591398]
mean value: 0.6476107226107226
MCC on Blind test: 0.39
Accuracy on Blind test: 0.74
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00875807 0.00828743 0.00949121 0.00866389 0.00851631 0.00862336
0.0086143 0.0088408 0.00863338 0.00873017]
mean value: 0.00871589183807373
key: score_time
value: [0.00956774 0.00985765 0.01549387 0.01442194 0.01471853 0.01451778
0.01004767 0.00984645 0.00971317 0.00975966]
mean value: 0.011794447898864746
key: test_mcc
value: [ 0.47140452 0.4472136 0.34188173 -0.2236068 0.1490712 0.12403473
0.44970061 0.16903085 0.60858062 -0.05555556]
mean value: 0.24817555070683373
key: train_mcc
value: [0.54149615 0.55582283 0.54439964 0.53687549 0.56244099 0.54871762
0.55632843 0.51305743 0.54381155 0.56692517]
mean value: 0.5469875304507212
key: test_accuracy
value: [0.72222222 0.72222222 0.66666667 0.38888889 0.55555556 0.55555556
0.70588235 0.58823529 0.76470588 0.47058824]
mean value: 0.6140522875816994
key: train_accuracy
value: [0.76582278 0.7721519 0.76582278 0.76582278 0.77848101 0.7721519
0.77358491 0.75471698 0.7672956 0.77987421]
mean value: 0.769572486267017
key: test_fscore
value: [0.76190476 0.70588235 0.7 0.42105263 0.33333333 0.42857143
0.73684211 0.53333333 0.71428571 0.47058824]
mean value: 0.5805793896505971
key: train_fscore
value: [0.74125874 0.74647887 0.73758865 0.74829932 0.76190476 0.75675676
0.75342466 0.74172185 0.74125874 0.75862069]
mean value: 0.7487313048122654
key: test_precision
value: [0.66666667 0.75 0.63636364 0.4 0.66666667 0.6
0.63636364 0.57142857 1. 0.5 ]
mean value: 0.6427489177489177
key: train_precision
value: [0.828125 0.84126984 0.83870968 0.80882353 0.82352941 0.8115942
0.83333333 0.78873239 0.828125 0.83333333]
mean value: 0.8235575723797082
key: test_recall
value: [0.88888889 0.66666667 0.77777778 0.44444444 0.22222222 0.33333333
0.875 0.5 0.55555556 0.44444444]
mean value: 0.5708333333333333
key: train_recall
value: [0.67088608 0.67088608 0.65822785 0.69620253 0.70886076 0.70886076
0.6875 0.7 0.67088608 0.69620253]
mean value: 0.6868512658227848
key: test_roc_auc
value: [0.72222222 0.72222222 0.66666667 0.38888889 0.55555556 0.55555556
0.71527778 0.58333333 0.77777778 0.47222222]
mean value: 0.6159722222222223
key: train_roc_auc
value: [0.76582278 0.7721519 0.76582278 0.76582278 0.77848101 0.7721519
0.77412975 0.75506329 0.76669304 0.77935127]
mean value: 0.7695490506329115
key: test_jcc
value: [0.61538462 0.54545455 0.53846154 0.26666667 0.2 0.27272727
0.58333333 0.36363636 0.55555556 0.30769231]
mean value: 0.4248912198912199
key: train_jcc
value: [0.58888889 0.59550562 0.58426966 0.59782609 0.61538462 0.60869565
0.6043956 0.58947368 0.58888889 0.61111111]
mean value: 0.5984439812908946
MCC on Blind test: 0.5
Accuracy on Blind test: 0.76
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01182699 0.01088428 0.0105989 0.01070714 0.01067519 0.01060295
0.0107522 0.01132798 0.01073313 0.01074529]
mean value: 0.010885405540466308
key: score_time
value: [0.00978112 0.00912356 0.00916409 0.00913835 0.00921488 0.00921655
0.00910473 0.01053929 0.00921345 0.00933743]
mean value: 0.009383344650268554
key: test_mcc
value: [0.62017367 0.67082039 0.79772404 0.4472136 0.26726124 0.89442719
0.65277778 0.41666667 0.88741197 0.29012943]
mean value: 0.594460596832796
key: train_mcc
value: [0.77239946 0.73707642 0.77314359 0.75289162 0.81332571 0.76004189
0.79895529 0.75044075 0.74944601 0.77766186]
mean value: 0.768538260425022
key: test_accuracy
value: [0.77777778 0.83333333 0.88888889 0.72222222 0.61111111 0.94444444
0.82352941 0.70588235 0.94117647 0.64705882]
mean value: 0.7895424836601307
key: train_accuracy
value: [0.88607595 0.86708861 0.88607595 0.87341772 0.90506329 0.87974684
0.89937107 0.87421384 0.87421384 0.88679245]
mean value: 0.8832059549398933
key: test_fscore
value: [0.81818182 0.84210526 0.9 0.73684211 0.46153846 0.94736842
0.82352941 0.70588235 0.94736842 0.7 ]
mean value: 0.7882816254952478
key: train_fscore
value: [0.8875 0.87272727 0.88888889 0.88095238 0.90909091 0.88198758
0.90123457 0.87951807 0.87654321 0.89156627]
mean value: 0.8870009144426378
key: test_precision
value: [0.69230769 0.8 0.81818182 0.7 0.75 0.9
0.77777778 0.66666667 0.9 0.63636364]
mean value: 0.7641297591297591
key: train_precision
value: [0.87654321 0.8372093 0.86746988 0.83146067 0.87209302 0.86585366
0.8902439 0.84883721 0.85542169 0.85057471]
mean value: 0.8595707258801916
key: test_recall
value: [1. 0.88888889 1. 0.77777778 0.33333333 1.
0.875 0.75 1. 0.77777778]
mean value: 0.8402777777777778
key: train_recall
value: [0.89873418 0.91139241 0.91139241 0.93670886 0.94936709 0.89873418
0.9125 0.9125 0.89873418 0.93670886]
mean value: 0.9166772151898734
key: test_roc_auc
value: [0.77777778 0.83333333 0.88888889 0.72222222 0.61111111 0.94444444
0.82638889 0.70833333 0.9375 0.63888889]
mean value: 0.7888888888888889
key: train_roc_auc
value: [0.88607595 0.86708861 0.88607595 0.87341772 0.90506329 0.87974684
0.89928797 0.87397152 0.87436709 0.88710443]
mean value: 0.8832199367088608
key: test_jcc
value: [0.69230769 0.72727273 0.81818182 0.58333333 0.3 0.9
0.7 0.54545455 0.9 0.53846154]
mean value: 0.6705011655011655
key: train_jcc
value: [0.79775281 0.77419355 0.8 0.78723404 0.83333333 0.78888889
0.82022472 0.78494624 0.78021978 0.80434783]
mean value: 0.7971141184118274
MCC on Blind test: 0.65
Accuracy on Blind test: 0.82
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.82319236 0.70165181 0.69482183 0.65259123 0.64576364 0.75334024
0.64439893 0.64480519 0.76589203 0.63085938]
mean value: 0.6957316637039185
key: score_time
value: [0.01468158 0.02285886 0.0164628 0.01601267 0.01622725 0.01623726
0.01622391 0.01489711 0.01475215 0.01502585]
mean value: 0.016337943077087403
key: test_mcc
value: [0.67082039 0.55555556 0.4472136 0.4472136 0.3721042 0.67082039
0.41666667 0.18055556 0.29166667 0.18055556]
mean value: 0.4233172181267415
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.77777778 0.72222222 0.72222222 0.66666667 0.83333333
0.70588235 0.58823529 0.64705882 0.58823529]
mean value: 0.7084967320261438
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.77777778 0.73684211 0.73684211 0.57142857 0.84210526
0.70588235 0.58823529 0.66666667 0.58823529]
mean value: 0.7056120693891592
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.77777778 0.7 0.7 0.8 0.8
0.66666667 0.55555556 0.66666667 0.625 ]
mean value: 0.7091666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 0.77777778 0.77777778 0.77777778 0.44444444 0.88888889
0.75 0.625 0.66666667 0.55555556]
mean value: 0.7152777777777778
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.77777778 0.72222222 0.72222222 0.66666667 0.83333333
0.70833333 0.59027778 0.64583333 0.59027778]
mean value: 0.7090277777777778
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 0.63636364 0.58333333 0.58333333 0.4 0.72727273
0.54545455 0.41666667 0.5 0.41666667]
mean value: 0.5536363636363637
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.65
Accuracy on Blind test: 0.82
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01664901 0.01520634 0.01341438 0.0132556 0.01205277 0.01345849
0.01324964 0.01256585 0.01373625 0.01405025]
mean value: 0.013763856887817384
key: score_time
value: [0.01184344 0.00920844 0.00879431 0.0087707 0.00871181 0.00874615
0.0091579 0.00878787 0.00946546 0.00946307]
mean value: 0.009294915199279784
key: test_mcc
value: [0.67082039 1. 1. 0.79772404 0.77777778 1.
0.78881064 0.88888889 0.76388889 0.78334945]
mean value: 0.8471260073570214
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 1. 1. 0.88888889 0.88888889 1.
0.88235294 0.94117647 0.88235294 0.88235294]
mean value: 0.9199346405228758
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 1. 1. 0.875 0.88888889 1.
0.88888889 0.94117647 0.88888889 0.9 ]
mean value: 0.9224948400412797
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 1. 1. 1. 0.88888889 1.
0.8 0.88888889 0.88888889 0.81818182]
mean value: 0.9084848484848485
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.77777778 0.88888889 1.
1. 1. 0.88888889 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 1. 1. 0.88888889 0.88888889 1.
0.88888889 0.94444444 0.88194444 0.875 ]
mean value: 0.9201388888888888
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 1. 1. 0.77777778 0.8 1.
0.8 0.88888889 0.8 0.81818182]
mean value: 0.8612121212121212
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10169578 0.10507154 0.0948379 0.09157467 0.09187484 0.09903312
0.09998369 0.0997653 0.10069346 0.0938766 ]
mean value: 0.09784069061279296
key: score_time
value: [0.01945329 0.0198369 0.01804137 0.01763129 0.01748037 0.01849914
0.01903343 0.0188868 0.01890397 0.01792884]
mean value: 0.018569540977478028
key: test_mcc
value: [0.62017367 0.77777778 0.77777778 0.34188173 0.53452248 0.67082039
0.78881064 0.16735967 0.88888889 0.29166667]
mean value: 0.5859679698606269
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.77777778 0.88888889 0.88888889 0.66666667 0.72222222 0.83333333
0.88235294 0.58823529 0.94117647 0.64705882]
mean value: 0.7836601307189542
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.81818182 0.88888889 0.88888889 0.7 0.61538462 0.84210526
0.88888889 0.46153846 0.94117647 0.66666667]
mean value: 0.7711719962184358
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.69230769 0.88888889 0.88888889 0.63636364 1. 0.8
0.8 0.6 1. 0.66666667]
mean value: 0.7973115773115773
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 0.88888889 0.77777778 0.44444444 0.88888889
1. 0.375 0.88888889 0.66666667]
mean value: 0.7819444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77777778 0.88888889 0.88888889 0.66666667 0.72222222 0.83333333
0.88888889 0.57638889 0.94444444 0.64583333]
mean value: 0.7833333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.69230769 0.8 0.8 0.53846154 0.44444444 0.72727273
0.8 0.3 0.88888889 0.5 ]
mean value: 0.6491375291375291
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.69
Accuracy on Blind test: 0.85
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01018333 0.00994539 0.01004744 0.0099349 0.01006651 0.00992036
0.01011968 0.00942922 0.01031256 0.01003194]
mean value: 0.00999913215637207
key: score_time
value: [0.00951838 0.00945115 0.00948143 0.00938463 0.00939989 0.00944781
0.00944304 0.00938702 0.00942564 0.00945687]
mean value: 0.009439587593078613
key: test_mcc
value: [ 0.23570226 0.55555556 0.34188173 -0.12403473 0.23570226 0.2236068
-0.07042952 0.04351941 0.52777778 0.07042952]
mean value: 0.2039711060652974
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61111111 0.77777778 0.66666667 0.44444444 0.61111111 0.61111111
0.47058824 0.52941176 0.76470588 0.52941176]
mean value: 0.6016339869281045
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.77777778 0.625 0.54545455 0.53333333 0.63157895
0.4 0.42857143 0.77777778 0.5 ]
mean value: 0.5886160476949951
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.58333333 0.77777778 0.71428571 0.46153846 0.66666667 0.6
0.42857143 0.5 0.77777778 0.57142857]
mean value: 0.6081379731379731
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 0.77777778 0.55555556 0.66666667 0.44444444 0.66666667
0.375 0.375 0.77777778 0.44444444]
mean value: 0.5861111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61111111 0.77777778 0.66666667 0.44444444 0.61111111 0.61111111
0.46527778 0.52083333 0.76388889 0.53472222]
mean value: 0.6006944444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.63636364 0.45454545 0.375 0.36363636 0.46153846
0.25 0.27272727 0.63636364 0.33333333]
mean value: 0.42835081585081586
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.63
Accuracy on Blind test: 0.85
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.24913812 1.27156734 1.20463991 1.19117212 1.18857145 1.18892694
1.18293142 1.16849566 1.18670654 1.27968574]
mean value: 1.211183524131775
key: score_time
value: [0.09751105 0.09638333 0.08782911 0.09188533 0.08863258 0.08907223
0.08769917 0.08837461 0.08824849 0.09611225]
mean value: 0.09117481708526612
key: test_mcc
value: [0.67082039 0.89442719 1. 0.70710678 0.70710678 1.
1. 0.29166667 1. 0.76388889]
mean value: 0.8035016702178504
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.94444444 1. 0.83333333 0.83333333 1.
1. 0.64705882 1. 0.88235294]
mean value: 0.8973856209150327
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.94736842 1. 0.8 0.8 1.
1. 0.625 1. 0.88888889]
mean value: 0.8903362573099416
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.9 1. 1. 1. 1.
1. 0.625 1. 0.88888889]
mean value: 0.9213888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.66666667 0.66666667 1.
1. 0.625 1. 0.88888889]
mean value: 0.8736111111111111
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83333333 0.94444444 1. 0.83333333 0.83333333 1.
1. 0.64583333 1. 0.88194444]
mean value: 0.8972222222222223
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: test_jcc
value: [0.72727273 0.9 1. 0.66666667 0.66666667 1.
1. 0.45454545 1. 0.8 ]
mean value: 0.8215151515151515
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.96250749 0.92383838 0.92812443 0.87012911 0.93210411 0.93901181
0.90484762 0.91266036 0.93102837 0.94302392]
mean value: 0.9247275590896606
key: score_time
value: [0.19898915 0.1324091 0.23657775 0.20810533 0.24690914 0.24799681
0.2273283 0.20355201 0.23310709 0.23432255]
mean value: 0.21692972183227538
key: test_mcc
value: [0.79772404 1. 1. 0.56980288 0.70710678 0.89442719
1. 0.29166667 1. 0.54935027]
mean value: 0.7810077821942321
key: train_mcc
value: [0.9621024 0.94967147 0.94967147 0.94967147 0.94967147 0.94967147
0.94997636 0.94968354 0.9499921 0.94968354]
mean value: 0.9509795300752732
key: test_accuracy
value: [0.88888889 1. 1. 0.77777778 0.83333333 0.94444444
1. 0.64705882 1. 0.76470588]
mean value: 0.8856209150326797
key: train_accuracy
value: [0.98101266 0.97468354 0.97468354 0.97468354 0.97468354 0.97468354
0.97484277 0.97484277 0.97484277 0.97484277]
mean value: 0.9753801448929226
key: test_fscore
value: [0.9 1. 1. 0.75 0.8 0.94117647
1. 0.625 1. 0.75 ]
mean value: 0.8766176470588235
key: train_fscore
value: [0.98113208 0.975 0.975 0.975 0.975 0.975
0.97530864 0.975 0.975 0.97468354]
mean value: 0.9756124261750804
key: test_precision
value: [0.81818182 1. 1. 0.85714286 1. 1.
1. 0.625 1. 0.85714286]
mean value: 0.9157467532467533
key: train_precision
value: [0.975 0.96296296 0.96296296 0.96296296 0.96296296 0.96296296
0.96341463 0.975 0.96296296 0.97468354]
mean value: 0.9665875956227916
key: test_recall
value: [1. 1. 1. 0.66666667 0.66666667 0.88888889
1. 0.625 1. 0.66666667]
mean value: 0.8513888888888889
key: train_recall
value: [0.98734177 0.98734177 0.98734177 0.98734177 0.98734177 0.98734177
0.9875 0.975 0.98734177 0.97468354]
mean value: 0.9848575949367089
key: test_roc_auc
value: [0.88888889 1. 1. 0.77777778 0.83333333 0.94444444
1. 0.64583333 1. 0.77083333]
mean value: 0.8861111111111111
key: train_roc_auc
value: [0.98101266 0.97468354 0.97468354 0.97468354 0.97468354 0.97468354
0.97476266 0.97484177 0.97492089 0.97484177]
mean value: 0.975379746835443
key: test_jcc
value: [0.81818182 1. 1. 0.6 0.66666667 0.88888889
1. 0.45454545 1. 0.6 ]
mean value: 0.8028282828282828
key: train_jcc
value: [0.96296296 0.95121951 0.95121951 0.95121951 0.95121951 0.95121951
0.95180723 0.95121951 0.95121951 0.95061728]
mean value: 0.9523924061195096
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02123356 0.00871611 0.00887299 0.00869632 0.00875688 0.00867677
0.00881886 0.00880957 0.00868821 0.00883961]
mean value: 0.010010886192321777
key: score_time
value: [0.01431727 0.0084784 0.00853419 0.00855255 0.00838017 0.00845575
0.00846887 0.00848198 0.0084846 0.00844526]
mean value: 0.009059906005859375
key: test_mcc
value: [0.11111111 0.2236068 0.4472136 0.56980288 0.26726124 0.33333333
0.16903085 0.16903085 0.52777778 0.18055556]
mean value: 0.2998723997129735
key: train_mcc
value: [0.62045203 0.58609427 0.62345811 0.58344823 0.54500286 0.55700665
0.53581684 0.5346519 0.61038989 0.59809901]
mean value: 0.5794419778966067
key: test_accuracy
value: [0.55555556 0.61111111 0.72222222 0.77777778 0.61111111 0.66666667
0.58823529 0.58823529 0.76470588 0.58823529]
mean value: 0.6473856209150327
key: train_accuracy
value: [0.81012658 0.79113924 0.81012658 0.79113924 0.7721519 0.77848101
0.7672956 0.7672956 0.80503145 0.79874214]
mean value: 0.7891529336836239
key: test_fscore
value: [0.55555556 0.63157895 0.73684211 0.75 0.46153846 0.66666667
0.53333333 0.53333333 0.77777778 0.58823529]
mean value: 0.6234861474954354
key: train_fscore
value: [0.80769231 0.77852349 0.8 0.79754601 0.77777778 0.77707006
0.76129032 0.7672956 0.8 0.79220779]
mean value: 0.7859403363639892
key: test_precision
value: [0.55555556 0.6 0.7 0.85714286 0.75 0.66666667
0.57142857 0.57142857 0.77777778 0.625 ]
mean value: 0.6675
key: train_precision
value: [0.81818182 0.82857143 0.84507042 0.77380952 0.75903614 0.78205128
0.78666667 0.7721519 0.81578947 0.81333333]
mean value: 0.7994661992145965
key: test_recall
value: [0.55555556 0.66666667 0.77777778 0.66666667 0.33333333 0.66666667
0.5 0.5 0.77777778 0.55555556]
mean value: 0.6
key: train_recall
value: [0.79746835 0.73417722 0.75949367 0.82278481 0.79746835 0.7721519
0.7375 0.7625 0.78481013 0.7721519 ]
mean value: 0.7740506329113924
key: test_roc_auc
value: [0.55555556 0.61111111 0.72222222 0.77777778 0.61111111 0.66666667
0.58333333 0.58333333 0.76388889 0.59027778]
mean value: 0.6465277777777778
key: train_roc_auc
value: [0.81012658 0.79113924 0.81012658 0.79113924 0.7721519 0.77848101
0.76748418 0.76732595 0.80490506 0.79857595]
mean value: 0.7891455696202532
key: test_jcc
value: [0.38461538 0.46153846 0.58333333 0.6 0.3 0.5
0.36363636 0.36363636 0.63636364 0.41666667]
mean value: 0.460979020979021
key: train_jcc
value: [0.67741935 0.63736264 0.66666667 0.66326531 0.63636364 0.63541667
0.61458333 0.62244898 0.66666667 0.65591398]
mean value: 0.6476107226107226
MCC on Blind test: 0.39
Accuracy on Blind test: 0.74
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.05445504 0.05187917 0.05913138 0.05450749 0.05523133 0.05698442
0.05438495 0.2290132 0.04490089 0.04462194]
mean value: 0.07051098346710205
key: score_time
value: [0.01012802 0.01056314 0.01021743 0.01016593 0.01083326 0.01125741
0.01048422 0.01154351 0.01079202 0.01051164]
mean value: 0.010649657249450684
key: test_mcc
value: [0.77777778 1. 1. 0.79772404 0.77777778 1.
0.88888889 0.88888889 1. 0.88741197]
mean value: 0.9018469336015741
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.94117647 0.94117647 1. 0.94117647]
mean value: 0.9490196078431372
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 1. 0.875 0.88888889 1.
0.94117647 0.94117647 1. 0.94736842]
mean value: 0.948249914000688
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 1. 1. 0.88888889 1.
0.88888889 0.88888889 1. 0.9 ]
mean value: 0.9455555555555555
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.77777778 0.88888889 1.
1. 1. 1. 1. ]
mean value: 0.9555555555555555
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.94444444 0.94444444 1. 0.9375 ]
mean value: 0.9493055555555555
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 1. 0.77777778 0.8 1.
0.88888889 0.88888889 1. 0.9 ]
mean value: 0.9055555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0296216 0.05306625 0.05170012 0.05277443 0.05263186 0.05192494
0.05169201 0.0526433 0.05250549 0.05197215]
mean value: 0.05005321502685547
key: score_time
value: [0.02377701 0.02253151 0.01830482 0.02136779 0.02420402 0.02055812
0.02063036 0.02410078 0.02347684 0.02271819]
mean value: 0.022166943550109862
key: test_mcc
value: [0.2236068 0.67082039 0.67082039 0.47140452 0.34188173 0.89442719
0.78881064 0.52777778 0.16903085 0.29166667]
mean value: 0.5050246958556477
key: train_mcc
value: [1. 0.9621024 0.97468354 0.98742088 0.98742088 0.97468354
0.96234177 1. 0.97484177 0.96234177]
mean value: 0.9785836569605925
key: test_accuracy
value: [0.61111111 0.83333333 0.83333333 0.72222222 0.66666667 0.94444444
0.88235294 0.76470588 0.58823529 0.64705882]
mean value: 0.7493464052287582
key: train_accuracy
value: [1. 0.98101266 0.98734177 0.99367089 0.99367089 0.98734177
0.98113208 1. 0.98742138 0.98113208]
mean value: 0.9892723509274739
key: test_fscore
value: [0.63157895 0.84210526 0.82352941 0.76190476 0.625 0.94117647
0.88888889 0.75 0.63157895 0.66666667]
mean value: 0.7562429357707996
key: train_fscore
value: [1. 0.98089172 0.98734177 0.99363057 0.99371069 0.98734177
0.98113208 1. 0.98734177 0.98113208]
mean value: 0.9892522452216623
key: test_precision
value: [0.6 0.8 0.875 0.66666667 0.71428571 1.
0.8 0.75 0.6 0.66666667]
mean value: 0.7472619047619048
key: train_precision
value: [1. 0.98717949 0.98734177 1. 0.9875 0.98734177
0.98734177 1. 0.98734177 0.975 ]
mean value: 0.9899046575787083
key: test_recall
value: [0.66666667 0.88888889 0.77777778 0.88888889 0.55555556 0.88888889
1. 0.75 0.66666667 0.66666667]
mean value: 0.775
key: train_recall
value: [1. 0.97468354 0.98734177 0.98734177 1. 0.98734177
0.975 1. 0.98734177 0.98734177]
mean value: 0.9886392405063291
key: test_roc_auc
value: [0.61111111 0.83333333 0.83333333 0.72222222 0.66666667 0.94444444
0.88888889 0.76388889 0.58333333 0.64583333]
mean value: 0.7493055555555556
key: train_roc_auc
value: [1. 0.98101266 0.98734177 0.99367089 0.99367089 0.98734177
0.98117089 1. 0.98742089 0.98117089]
mean value: 0.9892800632911393
key: test_jcc
value: [0.46153846 0.72727273 0.7 0.61538462 0.45454545 0.88888889
0.8 0.6 0.46153846 0.5 ]
mean value: 0.6209168609168609
key: train_jcc
value: [1. 0.9625 0.975 0.98734177 0.9875 0.975
0.96296296 1. 0.975 0.96296296]
mean value: 0.9788267698077825
MCC on Blind test: 0.64
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02158642 0.01039886 0.00884795 0.00946593 0.00987315 0.00987744
0.01004529 0.00951934 0.00998187 0.01012397]
mean value: 0.010972023010253906
key: score_time
value: [0.00903487 0.00969934 0.00854087 0.00920844 0.009238 0.00936532
0.00933886 0.00936294 0.00932789 0.00940394]
mean value: 0.009252047538757325
key: test_mcc
value: [0.47140452 0.67082039 0.79772404 0.4472136 0.3721042 1.
0.52777778 0.52777778 0.52777778 0.16903085]
mean value: 0.5511630932805054
key: train_mcc
value: [0.62045203 0.65955018 0.65870297 0.67632637 0.62104977 0.63336824
0.58739809 0.61006489 0.63574489 0.66044304]
mean value: 0.6363100454104343
key: test_accuracy
value: [0.72222222 0.83333333 0.88888889 0.72222222 0.66666667 1.
0.76470588 0.76470588 0.76470588 0.58823529]
mean value: 0.7715686274509803
key: train_accuracy
value: [0.81012658 0.82911392 0.82911392 0.83544304 0.81012658 0.8164557
0.79245283 0.80503145 0.81761006 0.83018868]
mean value: 0.8175662765703368
key: test_fscore
value: [0.76190476 0.84210526 0.875 0.73684211 0.57142857 1.
0.75 0.75 0.77777778 0.63157895]
mean value: 0.7696637426900584
key: train_fscore
value: [0.80769231 0.83435583 0.83229814 0.8452381 0.81481481 0.81987578
0.78431373 0.80745342 0.81987578 0.83018868]
mean value: 0.8196106556291618
key: test_precision
value: [0.66666667 0.8 1. 0.7 0.8 1.
0.75 0.75 0.77777778 0.6 ]
mean value: 0.7844444444444445
key: train_precision
value: [0.81818182 0.80952381 0.81707317 0.79775281 0.79518072 0.80487805
0.82191781 0.80246914 0.80487805 0.825 ]
mean value: 0.8096855371900288
key: test_recall
value: [0.88888889 0.88888889 0.77777778 0.77777778 0.44444444 1.
0.75 0.75 0.77777778 0.66666667]
mean value: 0.7722222222222223
key: train_recall
value: [0.79746835 0.86075949 0.84810127 0.89873418 0.83544304 0.83544304
0.75 0.8125 0.83544304 0.83544304]
mean value: 0.8309335443037975
key: test_roc_auc
value: [0.72222222 0.83333333 0.88888889 0.72222222 0.66666667 1.
0.76388889 0.76388889 0.76388889 0.58333333]
mean value: 0.7708333333333334
key: train_roc_auc
value: [0.81012658 0.82911392 0.82911392 0.83544304 0.81012658 0.8164557
0.79272152 0.80498418 0.81772152 0.83022152]
mean value: 0.8176028481012658
key: test_jcc
value: [0.61538462 0.72727273 0.77777778 0.58333333 0.4 1.
0.6 0.6 0.63636364 0.46153846]
mean value: 0.6401670551670552
key: train_jcc
value: [0.67741935 0.71578947 0.71276596 0.73195876 0.6875 0.69473684
0.64516129 0.67708333 0.69473684 0.70967742]
mean value: 0.6946829276077606
MCC on Blind test: 0.69
Accuracy on Blind test: 0.85
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01234007 0.01520658 0.01421547 0.01622844 0.01516342 0.01555896
0.01505685 0.01540637 0.01545596 0.01503229]
mean value: 0.014966440200805665
key: score_time
value: [0.00947547 0.01168132 0.01160622 0.01176691 0.01161003 0.01183724
0.01163387 0.01220989 0.01203275 0.01214504]
mean value: 0.011599874496459961
key: test_mcc
value: [0.67082039 0.77777778 0.70710678 0.4472136 0.62017367 0.79772404
0.54935027 0.69631062 0.49099025 0.41666667]
mean value: 0.6174134064971705
key: train_mcc
value: [0.93678391 0.93678391 0.75178998 0.9243469 0.94967147 0.91322332
0.91202532 0.81121795 0.5472547 0.87527844]
mean value: 0.8558375901294395
key: test_accuracy
value: [0.83333333 0.88888889 0.83333333 0.72222222 0.77777778 0.88888889
0.76470588 0.82352941 0.70588235 0.70588235]
mean value: 0.7944444444444444
key: train_accuracy
value: [0.96835443 0.96835443 0.86708861 0.96202532 0.97468354 0.9556962
0.95597484 0.89937107 0.72955975 0.93710692]
mean value: 0.9218215110261921
key: test_fscore
value: [0.84210526 0.88888889 0.85714286 0.73684211 0.71428571 0.875
0.77777778 0.84210526 0.7826087 0.70588235]
mean value: 0.8022638918267536
key: train_fscore
value: [0.96815287 0.96815287 0.88 0.9625 0.97435897 0.95424837
0.95597484 0.90804598 0.78606965 0.93506494]
mean value: 0.9292568479441141
key: test_precision
value: [0.8 0.88888889 0.75 0.7 1. 1.
0.7 0.72727273 0.64285714 0.75 ]
mean value: 0.7959018759018759
key: train_precision
value: [0.97435897 0.97435897 0.80208333 0.95061728 0.98701299 0.98648649
0.96202532 0.84042553 0.64754098 0.96 ]
mean value: 0.908490987147852
key: test_recall
value: [0.88888889 0.88888889 1. 0.77777778 0.55555556 0.77777778
0.875 1. 1. 0.66666667]
mean value: 0.8430555555555556
key: train_recall
value: [0.96202532 0.96202532 0.97468354 0.97468354 0.96202532 0.92405063
0.95 0.9875 1. 0.91139241]
mean value: 0.9608386075949367
key: test_roc_auc
value: [0.83333333 0.88888889 0.83333333 0.72222222 0.77777778 0.88888889
0.77083333 0.83333333 0.6875 0.70833333]
mean value: 0.7944444444444445
key: train_roc_auc
value: [0.96835443 0.96835443 0.86708861 0.96202532 0.97468354 0.9556962
0.95601266 0.89881329 0.73125 0.9369462 ]
mean value: 0.9219224683544304
key: test_jcc
value: [0.72727273 0.8 0.75 0.58333333 0.55555556 0.77777778
0.63636364 0.72727273 0.64285714 0.54545455]
mean value: 0.6745887445887446
key: train_jcc
value: [0.9382716 0.9382716 0.78571429 0.92771084 0.95 0.9125
0.91566265 0.83157895 0.64754098 0.87804878]
mean value: 0.8725299701029515
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01438379 0.01469326 0.01489258 0.01415277 0.01420665 0.01426673
0.01395035 0.01545525 0.01461434 0.01494837]
mean value: 0.014556407928466797
key: score_time
value: [0.00999117 0.01173043 0.01184225 0.01177931 0.01166821 0.01177931
0.01173019 0.01165199 0.01175547 0.01169562]
mean value: 0.011562395095825195
key: test_mcc
value: [0.1490712 0.67082039 0.70710678 0.4472136 0.3721042 0.56980288
0.65277778 0.54935027 0.78334945 0.40849122]
mean value: 0.531008777277298
key: train_mcc
value: [0.56273143 0.9621024 0.89188259 0.93678391 0.85805812 0.75637877
0.81121795 0.8885037 0.9256747 0.90401404]
mean value: 0.8497347611583999
key: test_accuracy
value: [0.55555556 0.83333333 0.83333333 0.72222222 0.66666667 0.77777778
0.82352941 0.76470588 0.88235294 0.70588235]
mean value: 0.7565359477124183
key: train_accuracy
value: [0.74050633 0.98101266 0.94303797 0.96835443 0.92405063 0.86708861
0.89937107 0.94339623 0.96226415 0.94968553]
mean value: 0.9178767614043468
key: test_fscore
value: [0.33333333 0.84210526 0.85714286 0.73684211 0.57142857 0.75
0.82352941 0.77777778 0.9 0.73684211]
mean value: 0.7329001425131456
key: train_fscore
value: [0.64957265 0.98113208 0.94610778 0.96815287 0.92941176 0.84892086
0.90804598 0.94545455 0.96103896 0.95180723]
mean value: 0.9089644716153422
key: test_precision
value: [0.66666667 0.8 0.75 0.7 0.8 0.85714286
0.77777778 0.7 0.81818182 0.7 ]
mean value: 0.756976911976912
key: train_precision
value: [1. 0.975 0.89772727 0.97435897 0.86813187 0.98333333
0.84042553 0.91764706 0.98666667 0.90804598]
mean value: 0.9351336682968032
key: test_recall
value: [0.22222222 0.88888889 1. 0.77777778 0.44444444 0.66666667
0.875 0.875 1. 0.77777778]
mean value: 0.7527777777777778
key: train_recall
value: [0.48101266 0.98734177 1. 0.96202532 1. 0.74683544
0.9875 0.975 0.93670886 1. ]
mean value: 0.9076424050632912
key: test_roc_auc
value: [0.55555556 0.83333333 0.83333333 0.72222222 0.66666667 0.77777778
0.82638889 0.77083333 0.875 0.70138889]
mean value: 0.75625
key: train_roc_auc
value: [0.74050633 0.98101266 0.94303797 0.96835443 0.92405063 0.86708861
0.89881329 0.9431962 0.96210443 0.95 ]
mean value: 0.9178164556962025
key: test_jcc
value: [0.2 0.72727273 0.75 0.58333333 0.4 0.6
0.7 0.63636364 0.81818182 0.58333333]
mean value: 0.5998484848484849
key: train_jcc
value: [0.48101266 0.96296296 0.89772727 0.9382716 0.86813187 0.7375
0.83157895 0.89655172 0.925 0.90804598]
mean value: 0.844678301550607
MCC on Blind test: 0.74
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11545205 0.10431743 0.10553479 0.10573816 0.10540199 0.10588789
0.10381937 0.10629654 0.10485244 0.10286975]
mean value: 0.10601704120635987
key: score_time
value: [0.01625061 0.01548195 0.01487517 0.0151825 0.01617122 0.01584601
0.01527643 0.0148437 0.01535702 0.01589417]
mean value: 0.015517878532409667
key: test_mcc
value: [0.77777778 1. 1. 0.79772404 0.77777778 0.89442719
1. 0.88888889 1. 0.88741197]
mean value: 0.9024007638126769
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 1. 1. 0.88888889 0.88888889 0.94444444
1. 0.94117647 1. 0.94117647]
mean value: 0.9493464052287581
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 1. 0.875 0.88888889 0.94117647
1. 0.94117647 1. 0.94736842]
mean value: 0.948249914000688
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 1. 1. 0.88888889 1.
1. 0.88888889 1. 0.9 ]
mean value: 0.9566666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.77777778 0.88888889 0.88888889
1. 1. 1. 1. ]
mean value: 0.9444444444444444
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88888889 1. 1. 0.88888889 0.88888889 0.94444444
1. 0.94444444 1. 0.9375 ]
mean value: 0.9493055555555555
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 1. 0.77777778 0.8 0.88888889
1. 0.88888889 1. 0.9 ]
mean value: 0.9055555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03948283 0.04953337 0.03142118 0.03478074 0.03619123 0.03574586
0.03580761 0.04659438 0.03955555 0.03685379]
mean value: 0.03859665393829346
key: score_time
value: [0.02498698 0.02185464 0.01938224 0.02102304 0.0222981 0.0198257
0.02262115 0.02782845 0.02416492 0.02282 ]
mean value: 0.02268052101135254
key: test_mcc
value: [0.77777778 1. 1. 0.79772404 0.77777778 1.
0.78881064 0.88888889 1. 0.88741197]
mean value: 0.8918391084873468
key: train_mcc
value: [1. 1. 0.97499604 1. 0.98742088 0.98742088
0.96234177 0.9875 0.98749803 0.98749803]
mean value: 0.9874675649417293
key: test_accuracy
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.88235294 0.94117647 1. 0.94117647]
mean value: 0.9431372549019608
key: train_accuracy
value: [1. 1. 0.98734177 1. 0.99367089 0.99367089
0.98113208 0.99371069 0.99371069 0.99371069]
mean value: 0.9936947695247194
key: test_fscore
value: [0.88888889 1. 1. 0.875 0.88888889 1.
0.88888889 0.94117647 1. 0.94736842]
mean value: 0.9430211558307534
key: train_fscore
value: [1. 1. 0.98717949 1. 0.99363057 0.99363057
0.98113208 0.99371069 0.99363057 0.99363057]
mean value: 0.9936544547468715
key: test_precision
value: [0.88888889 1. 1. 1. 0.88888889 1.
0.8 0.88888889 1. 0.9 ]
mean value: 0.9366666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.98734177 1. 1. 1. ]
mean value: 0.9987341772151899
key: test_recall
value: [0.88888889 1. 1. 0.77777778 0.88888889 1.
1. 1. 1. 1. ]
mean value: 0.9555555555555555
key: train_recall
value: [1. 1. 0.97468354 1. 0.98734177 0.98734177
0.975 0.9875 0.98734177 0.98734177]
mean value: 0.9886550632911393
key: test_roc_auc
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.88888889 0.94444444 1. 0.9375 ]
mean value: 0.94375
key: train_roc_auc
value: [1. 1. 0.98734177 1. 0.99367089 0.99367089
0.98117089 0.99375 0.99367089 0.99367089]
mean value: 0.9936946202531646
key: test_jcc
value: [0.8 1. 1. 0.77777778 0.8 1.
0.8 0.88888889 1. 0.9 ]
mean value: 0.8966666666666667
key: train_jcc
value: [1. 1. 0.97468354 1. 0.98734177 0.98734177
0.96296296 0.9875 0.98734177 0.98734177]
mean value: 0.9874513595874356
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03753996 0.0400219 0.04699039 0.04979587 0.02395082 0.02255321
0.02178669 0.04306293 0.05040836 0.05459094]
mean value: 0.03907010555267334
key: score_time
value: [0.02251029 0.01236272 0.02456164 0.02370811 0.01268172 0.01240444
0.01238751 0.02090907 0.02468371 0.01258755]
mean value: 0.017879676818847657
key: test_mcc
value: [0.33333333 0.11396058 0.34188173 0.11111111 0.1490712 0.11396058
0.09128709 0.04351941 0.60858062 0.18055556]
mean value: 0.20872612071548124
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.55555556 0.66666667 0.55555556 0.55555556 0.55555556
0.52941176 0.52941176 0.76470588 0.58823529]
mean value: 0.5967320261437908
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.5 0.7 0.55555556 0.33333333 0.5
0.6 0.42857143 0.71428571 0.58823529]
mean value: 0.5586647992530346
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.57142857 0.63636364 0.55555556 0.66666667 0.57142857
0.5 0.5 1. 0.625 ]
mean value: 0.6293109668109668
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.44444444 0.77777778 0.55555556 0.22222222 0.44444444
0.75 0.375 0.55555556 0.55555556]
mean value: 0.5347222222222222
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66666667 0.55555556 0.66666667 0.55555556 0.55555556 0.55555556
0.54166667 0.52083333 0.77777778 0.59027778]
mean value: 0.5986111111111111
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.33333333 0.53846154 0.38461538 0.2 0.33333333
0.42857143 0.27272727 0.55555556 0.41666667]
mean value: 0.39632645132645133
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.5
Accuracy on Blind test: 0.76
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.3016758 0.29095507 0.28988028 0.28957891 0.28800511 0.29090023
0.29927564 0.30395865 0.2906909 0.29244685]
mean value: 0.29373674392700194
key: score_time
value: [0.00947571 0.00942588 0.00925851 0.01021242 0.00925803 0.00914812
0.00925756 0.00914478 0.00958729 0.00923419]
mean value: 0.009400248527526855
key: test_mcc
value: [0.77777778 1. 1. 0.79772404 0.77777778 1.
0.88888889 0.88888889 1. 0.88741197]
mean value: 0.9018469336015741
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.94117647 0.94117647 1. 0.94117647]
mean value: 0.9490196078431372
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 1. 0.875 0.88888889 1.
0.94117647 0.94117647 1. 0.94736842]
mean value: 0.948249914000688
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 1. 1. 0.88888889 1.
0.88888889 0.88888889 1. 0.9 ]
mean value: 0.9455555555555555
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 1. 0.77777778 0.88888889 1.
1. 1. 1. 1. ]
mean value: 0.9555555555555555
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88888889 1. 1. 0.88888889 0.88888889 1.
0.94444444 0.94444444 1. 0.9375 ]
mean value: 0.9493055555555555
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 1. 0.77777778 0.8 1.
0.88888889 0.88888889 1. 0.9 ]
mean value: 0.9055555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01823831 0.01897502 0.01943731 0.0219481 0.01919198 0.01920938
0.01920414 0.02677155 0.01945543 0.02004194]
mean value: 0.02024731636047363
key: score_time
value: [0.01213932 0.01207805 0.01471949 0.02359366 0.01789927 0.01716661
0.01457524 0.01231289 0.01462674 0.01511145]
mean value: 0.01542227268218994
key: test_mcc
value: [-0.11396058 0.62017367 0.55555556 0. -0.4472136 -0.11111111
0.07042952 -0.05555556 0.24514517 -0.29166667]
mean value: 0.04717964133587751
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.44444444 0.77777778 0.77777778 0.5 0.27777778 0.44444444
0.52941176 0.47058824 0.58823529 0.35294118]
mean value: 0.5163398692810458
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.81818182 0.77777778 0.57142857 0.31578947 0.44444444
0.55555556 0.47058824 0.46153846 0.35294118]
mean value: 0.5268245514375546
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.45454545 0.69230769 0.77777778 0.5 0.3 0.44444444
0.5 0.44444444 0.75 0.375 ]
mean value: 0.5238519813519813
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.55555556 1. 0.77777778 0.66666667 0.33333333 0.44444444
0.625 0.5 0.33333333 0.33333333]
mean value: 0.5569444444444445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.44444444 0.77777778 0.77777778 0.5 0.27777778 0.44444444
0.53472222 0.47222222 0.60416667 0.35416667]
mean value: 0.5187499999999999
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.69230769 0.63636364 0.4 0.1875 0.28571429
0.38461538 0.30769231 0.3 0.21428571]
mean value: 0.37418123543123544
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.5
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03576422 0.02692246 0.01386642 0.01340842 0.01377153 0.04542184
0.03384018 0.03396988 0.03371286 0.0340662 ]
mean value: 0.02847440242767334
key: score_time
value: [0.02211285 0.01175857 0.01203871 0.01246452 0.01206517 0.01190996
0.02071691 0.02354932 0.0222075 0.02081013]
mean value: 0.01696336269378662
key: test_mcc
value: [0.67082039 0.77777778 0.67082039 0.55555556 0.62017367 0.77777778
0.65277778 0.54935027 1. 0.65277778]
mean value: 0.6927831391686119
key: train_mcc
value: [0.94967147 0.94967147 0.93678391 0.94967147 0.9621024 0.9621024
0.92482989 0.96234177 0.94997636 0.96233582]
mean value: 0.9509486971738486
key: test_accuracy
value: [0.83333333 0.88888889 0.83333333 0.77777778 0.77777778 0.88888889
0.82352941 0.76470588 1. 0.82352941]
mean value: 0.8411764705882353
key: train_accuracy
value: [0.97468354 0.97468354 0.96835443 0.97468354 0.98101266 0.98101266
0.96226415 0.98113208 0.97484277 0.98113208]
mean value: 0.9753801448929226
key: test_fscore
value: [0.84210526 0.88888889 0.84210526 0.77777778 0.71428571 0.88888889
0.82352941 0.77777778 1. 0.82352941]
mean value: 0.8378888397464249
key: train_fscore
value: [0.97435897 0.97435897 0.96815287 0.97435897 0.98113208 0.98089172
0.96202532 0.98113208 0.97435897 0.98089172]
mean value: 0.9751661670567474
key: test_precision
value: [0.8 0.88888889 0.8 0.77777778 1. 0.88888889
0.77777778 0.7 1. 0.875 ]
mean value: 0.8508333333333333
key: train_precision
value: [0.98701299 0.98701299 0.97435897 0.98701299 0.975 0.98717949
0.97435897 0.98734177 0.98701299 0.98717949]
mean value: 0.983347064328077
key: test_recall
value: [0.88888889 0.88888889 0.88888889 0.77777778 0.55555556 0.88888889
0.875 0.875 1. 0.77777778]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.8416666666666667
key: train_recall
value: [0.96202532 0.96202532 0.96202532 0.96202532 0.98734177 0.97468354
0.95 0.975 0.96202532 0.97468354]
mean value: 0.9671835443037975
key: test_roc_auc
value: [0.83333333 0.88888889 0.83333333 0.77777778 0.77777778 0.88888889
0.82638889 0.77083333 1. 0.82638889]
mean value: 0.8423611111111111
key: train_roc_auc
value: [0.97468354 0.97468354 0.96835443 0.97468354 0.98101266 0.98101266
0.96234177 0.98117089 0.97476266 0.98109177]
mean value: 0.975379746835443
key: test_jcc
value: [0.72727273 0.8 0.72727273 0.63636364 0.55555556 0.8
0.7 0.63636364 1. 0.7 ]
mean value: 0.7282828282828283
key: train_jcc
value: [0.95 0.95 0.9382716 0.95 0.96296296 0.9625
0.92682927 0.96296296 0.95 0.9625 ]
mean value: 0.951602679915688
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25937772 0.18409133 0.15225029 0.16805243 0.20423412 0.23713088
0.21333742 0.21399665 0.22232294 0.31374693]
mean value: 0.21685407161712647
key: score_time
value: [0.01209164 0.0203526 0.01194811 0.01314783 0.0207057 0.01997375
0.02330065 0.01731992 0.02275801 0.02117276]
mean value: 0.01827709674835205
key: test_mcc
value: [0.55555556 0.77777778 0.67082039 0.67082039 0.62017367 0.77777778
0.65277778 0.54935027 1. 0.65277778]
mean value: 0.6927831391686119
key: train_mcc
value: [0.98742088 0.94967147 0.93678391 0.9621024 0.9621024 0.9621024
0.92482989 0.96234177 0.94997636 0.96233582]
mean value: 0.9559667312380578
key: test_accuracy
value: [0.77777778 0.88888889 0.83333333 0.83333333 0.77777778 0.88888889
0.82352941 0.76470588 1. 0.82352941]
mean value: 0.8411764705882353
key: train_accuracy
value: [0.99367089 0.97468354 0.96835443 0.98101266 0.98101266 0.98101266
0.96226415 0.98113208 0.97484277 0.98113208]
mean value: 0.9779117904625428
key: test_fscore
value: [0.77777778 0.88888889 0.84210526 0.84210526 0.71428571 0.88888889
0.82352941 0.77777778 1. 0.82352941]
mean value: 0.8378888397464249
key: train_fscore
value: [0.99363057 0.97435897 0.96815287 0.98089172 0.98113208 0.98089172
0.96202532 0.98113208 0.97435897 0.98089172]
mean value: 0.9777466014843156
key: test_precision
value: [0.77777778 0.88888889 0.8 0.8 1. 0.88888889
0.77777778 0.7 1. 0.875 ]
mean value: 0.8508333333333333
key: train_precision
value: [1. 0.98701299 0.97435897 0.98717949 0.975 0.98717949
0.97435897 0.98734177 0.98701299 0.98717949]
mean value: 0.9846624156434283
key: test_recall
value: [0.77777778 0.88888889 0.88888889 0.88888889 0.55555556 0.88888889
0.875 0.875 1. 0.77777778]
mean value: 0.8416666666666667
key: train_recall
value: [0.98734177 0.96202532 0.96202532 0.97468354 0.98734177 0.97468354
0.95 0.975 0.96202532 0.97468354]
mean value: 0.9709810126582279
key: test_roc_auc
value: [0.77777778 0.88888889 0.83333333 0.83333333 0.77777778 0.88888889
0.82638889 0.77083333 1. 0.82638889]
mean value: 0.8423611111111111
key: train_roc_auc
value: [0.99367089 0.97468354 0.96835443 0.98101266 0.98101266 0.98101266
0.96234177 0.98117089 0.97476266 0.98109177]
mean value: 0.9779113924050633
key: test_jcc
value: [0.63636364 0.8 0.72727273 0.72727273 0.55555556 0.8
0.7 0.63636364 1. 0.7 ]
mean value: 0.7282828282828283
key: train_jcc
value: [0.98734177 0.95 0.9382716 0.9625 0.96296296 0.9625
0.92682927 0.96296296 0.95 0.9625 ]
mean value: 0.9565868571308779
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03828049 0.03973126 0.0394094 0.03932691 0.03923225 0.03911781
0.03785086 0.03869772 0.03864527 0.03835726]
mean value: 0.03886492252349853
key: score_time
value: [0.012254 0.01499844 0.01548243 0.01451349 0.01436687 0.0145216
0.01563859 0.01568961 0.01582623 0.02349043]
mean value: 0.01567816734313965
key: test_mcc
value: [0.90950859 0.78824078 0.94028478 0.96966868 0.87867338 0.88382395
0.94028478 0.81671746 0.75378788 0.87689394]
mean value: 0.8757884239501552
key: train_mcc
value: [0.9389166 0.93858842 0.93219993 0.93239416 0.9456811 0.92888267
0.9114359 0.91482668 0.92506472 0.94252441]
mean value: 0.9310514594809193
key: test_accuracy
value: [0.95454545 0.89393939 0.96923077 0.98461538 0.93846154 0.93846154
0.96923077 0.90769231 0.87692308 0.93846154]
mean value: 0.9371561771561772
key: train_accuracy
value: [0.96928328 0.96928328 0.96592845 0.96592845 0.97274276 0.96422487
0.95570698 0.95741056 0.96252129 0.97103918]
mean value: 0.9654069108267292
key: test_fscore
value: [0.95522388 0.89552239 0.96969697 0.98412698 0.93939394 0.94117647
0.96875 0.91176471 0.87878788 0.93939394]
mean value: 0.9383837156527016
key: train_fscore
value: [0.96969697 0.96938776 0.96644295 0.96655518 0.97306397 0.96482412
0.95578231 0.95741056 0.96258503 0.97142857]
mean value: 0.9657177435980547
key: test_precision
value: [0.94117647 0.88235294 0.94117647 1. 0.91176471 0.88888889
1. 0.88571429 0.87878788 0.93939394]
mean value: 0.9269255581020287
key: train_precision
value: [0.95681063 0.96610169 0.95364238 0.95065789 0.96333333 0.95049505
0.95254237 0.95578231 0.95932203 0.95695364]
mean value: 0.9565641349914513
key: test_recall
value: [0.96969697 0.90909091 1. 0.96875 0.96875 1.
0.93939394 0.93939394 0.87878788 0.93939394]
mean value: 0.9513257575757575
key: train_recall
value: [0.98293515 0.97269625 0.97959184 0.9829932 0.9829932 0.97959184
0.95904437 0.95904437 0.96587031 0.98634812]
mean value: 0.9751108634580112
key: test_roc_auc
value: [0.95454545 0.89393939 0.96969697 0.984375 0.93892045 0.93939394
0.96969697 0.90719697 0.87689394 0.93844697]
mean value: 0.9373106060606061
key: train_roc_auc
value: [0.96928328 0.96928328 0.96590513 0.96589933 0.97272527 0.96419865
0.95571266 0.95741334 0.96252699 0.97106522]
mean value: 0.9654013141092614
key: test_jcc
value: [0.91428571 0.81081081 0.94117647 0.96875 0.88571429 0.88888889
0.93939394 0.83783784 0.78378378 0.88571429]
mean value: 0.8856356017017781
key: train_jcc
value: [0.94117647 0.94059406 0.93506494 0.93527508 0.94754098 0.93203883
0.91530945 0.91830065 0.92786885 0.94444444]
mean value: 0.9337613761275577
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.87459993 1.2076571 0.90833497 1.01458383 1.08987594 1.33507919
1.01251268 0.96896434 0.90253735 1.01945567]
mean value: 1.0333600997924806
key: score_time
value: [0.01497388 0.01613474 0.0149045 0.0150032 0.01484799 0.01342773
0.01664591 0.0125823 0.01590395 0.01560664]
mean value: 0.015003085136413574
key: test_mcc
value: [0.90950859 0.88040627 0.88382395 0.90814394 0.84995597 0.96969697
1. 0.91144345 0.94017476 0.90814394]
mean value: 0.9161297844265968
key: train_mcc
value: [0.96928892 0.95906671 0.99320865 0.97615537 0.97276495 0.95571215
0.96934132 0.95232236 1. 0.97957999]
mean value: 0.9727440421839888
key: test_accuracy
value: [0.95454545 0.93939394 0.93846154 0.95384615 0.92307692 0.98461538
1. 0.95384615 0.96923077 0.95384615]
mean value: 0.9570862470862471
key: train_accuracy
value: [0.98464164 0.97952218 0.99659284 0.98807496 0.98637138 0.97785349
0.9846678 0.97614991 1. 0.98977853]
mean value: 0.9863652749271764
key: test_fscore
value: [0.95522388 0.94117647 0.94117647 0.95384615 0.92537313 0.98461538
1. 0.95652174 0.97058824 0.95384615]
mean value: 0.9582367622834089
key: train_fscore
value: [0.98461538 0.97959184 0.99661017 0.98811545 0.98644068 0.97792869
0.9846678 0.97619048 1. 0.98979592]
mean value: 0.9863956408365138
key: test_precision
value: [0.94117647 0.91428571 0.88888889 0.93939394 0.88571429 0.96969697
1. 0.91666667 0.94285714 0.96875 ]
mean value: 0.9367430078091843
key: train_precision
value: [0.98630137 0.97627119 0.99324324 0.98644068 0.98310811 0.97627119
0.9829932 0.97288136 1. 0.98644068]
mean value: 0.984395100323904
key: test_recall
value: [0.96969697 0.96969697 1. 0.96875 0.96875 1.
1. 1. 1. 0.93939394]
mean value: 0.9816287878787879
key: train_recall
value: [0.98293515 0.98293515 1. 0.98979592 0.98979592 0.97959184
0.98634812 0.97952218 1. 0.99317406]
mean value: 0.9884098349237306
key: test_roc_auc
value: [0.95454545 0.93939394 0.93939394 0.95407197 0.92376894 0.98484848
1. 0.953125 0.96875 0.95407197]
mean value: 0.9571969696969698
key: train_roc_auc
value: [0.98464164 0.97952218 0.99658703 0.98807202 0.98636554 0.97785053
0.98467066 0.97615565 1. 0.98978431]
mean value: 0.9863649555385294
key: test_jcc
value: [0.91428571 0.88888889 0.88888889 0.91176471 0.86111111 0.96969697
1. 0.91666667 0.94285714 0.91176471]
mean value: 0.9205924794160089
key: train_jcc
value: [0.96969697 0.96 0.99324324 0.97651007 0.97324415 0.95681063
0.96979866 0.95348837 1. 0.97979798]
mean value: 0.9732590068049858
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0157187 0.01268959 0.01264095 0.01270771 0.01175928 0.01104641
0.01074839 0.01086068 0.01050949 0.01081109]
mean value: 0.01194922924041748
key: score_time
value: [0.01302481 0.01075292 0.01039052 0.00946784 0.00998569 0.00904059
0.00938535 0.009027 0.00891423 0.00903201]
mean value: 0.009902095794677735
key: test_mcc
value: [0.78824078 0.54772256 0.48376972 0.84644588 0.72322307 0.66161167
0.69383917 0.60037879 0.53838887 0.66193182]
mean value: 0.6545552328074842
key: train_mcc
value: [0.64506622 0.71337706 0.672924 0.69348941 0.6624506 0.68322362
0.69015689 0.70358246 0.66615196 0.6906491 ]
mean value: 0.6821071327367672
key: test_accuracy
value: [0.89393939 0.77272727 0.73846154 0.92307692 0.86153846 0.83076923
0.84615385 0.8 0.76923077 0.83076923]
mean value: 0.8266666666666667
key: train_accuracy
value: [0.8225256 0.85665529 0.83645656 0.84667802 0.82793867 0.84156729
0.84497445 0.85178876 0.8330494 0.84497445]
mean value: 0.840660848532772
key: test_fscore
value: [0.89230769 0.7826087 0.75362319 0.92063492 0.85714286 0.82539683
0.84375 0.8 0.7761194 0.83076923]
mean value: 0.8282352813294571
key: train_fscore
value: [0.82312925 0.85762712 0.83728814 0.84848485 0.81535649 0.84317032
0.846543 0.85178876 0.83161512 0.84808013]
mean value: 0.8403083176678291
key: test_precision
value: [0.90625 0.75 0.7027027 0.93548387 0.87096774 0.83870968
0.87096774 0.8125 0.76470588 0.84375 ]
mean value: 0.8296037617313708
key: train_precision
value: [0.82033898 0.85185185 0.83445946 0.84 0.88142292 0.8361204
0.83666667 0.85034014 0.83737024 0.83006536]
mean value: 0.8418636025013883
key: test_recall
value: [0.87878788 0.81818182 0.8125 0.90625 0.84375 0.8125
0.81818182 0.78787879 0.78787879 0.81818182]
mean value: 0.8284090909090909
key: train_recall
value: [0.82593857 0.86348123 0.84013605 0.85714286 0.7585034 0.85034014
0.85665529 0.85324232 0.82593857 0.8668942 ]
mean value: 0.8398272619628055
key: test_roc_auc
value: [0.89393939 0.77272727 0.73958333 0.92282197 0.86126894 0.83049242
0.84659091 0.80018939 0.76893939 0.83096591]
mean value: 0.826751893939394
key: train_roc_auc
value: [0.8225256 0.85665529 0.83645028 0.84666017 0.82805716 0.84155232
0.84499431 0.85179123 0.83303731 0.84501172]
mean value: 0.8406735390401894
key: test_jcc
value: [0.80555556 0.64285714 0.60465116 0.85294118 0.75 0.7027027
0.72972973 0.66666667 0.63414634 0.71052632]
mean value: 0.7099776794025972
key: train_jcc
value: [0.69942197 0.75074184 0.72011662 0.73684211 0.6882716 0.72886297
0.73391813 0.74183976 0.71176471 0.73623188]
mean value: 0.7248011588325265
MCC on Blind test: 0.47
Accuracy on Blind test: 0.79
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0107584 0.01122403 0.01064491 0.01139951 0.01227427 0.01099348
0.01089954 0.01204586 0.01091719 0.01089287]
mean value: 0.011205005645751952
key: score_time
value: [0.00904584 0.00896311 0.00916839 0.00917315 0.00924873 0.00929451
0.00943422 0.00923228 0.00928068 0.00969172]
mean value: 0.009253263473510742
key: test_mcc
value: [0.54772256 0.45454545 0.72572613 0.70352647 0.60000027 0.41516606
0.42714107 0.41785545 0.38461695 0.73234704]
mean value: 0.5408647457818198
key: train_mcc
value: [0.54699656 0.59399689 0.57120468 0.58602719 0.58602719 0.56692716
0.58164247 0.59178072 0.61296324 0.60857864]
mean value: 0.5846144748987409
key: test_accuracy
value: [0.77272727 0.72727273 0.86153846 0.84615385 0.8 0.70769231
0.70769231 0.70769231 0.69230769 0.86153846]
mean value: 0.7684615384615385
key: train_accuracy
value: [0.77303754 0.79522184 0.78534923 0.79216354 0.79216354 0.78194208
0.79045997 0.7955707 0.80579216 0.80408859]
mean value: 0.7915789198272002
key: test_fscore
value: [0.76190476 0.72727273 0.85245902 0.82758621 0.79365079 0.6984127
0.6779661 0.6984127 0.70588235 0.85245902]
mean value: 0.7596006373973208
key: train_fscore
value: [0.76625659 0.7833935 0.78125 0.7844523 0.7844523 0.77060932
0.78458844 0.79020979 0.79858657 0.8 ]
mean value: 0.7843798808929663
key: test_precision
value: [0.8 0.72727273 0.89655172 0.92307692 0.80645161 0.70967742
0.76923077 0.73333333 0.68571429 0.92857143]
mean value: 0.7979880223595462
key: train_precision
value: [0.78985507 0.83141762 0.79787234 0.81617647 0.81617647 0.81439394
0.8057554 0.81003584 0.82783883 0.81560284]
mean value: 0.8125124820676404
key: test_recall
value: [0.72727273 0.72727273 0.8125 0.75 0.78125 0.6875
0.60606061 0.66666667 0.72727273 0.78787879]
mean value: 0.7273674242424243
key: train_recall
value: [0.7440273 0.74061433 0.76530612 0.75510204 0.75510204 0.73129252
0.76450512 0.77133106 0.77133106 0.78498294]
mean value: 0.7583594529962155
key: test_roc_auc
value: [0.77272727 0.72727273 0.86079545 0.84469697 0.79971591 0.70738636
0.7092803 0.70833333 0.69176136 0.86268939]
mean value: 0.7684659090909091
key: train_roc_auc
value: [0.77303754 0.79522184 0.78538344 0.79222679 0.79222679 0.78202851
0.79041583 0.79552947 0.80573356 0.80405609]
mean value: 0.7915859859302082
key: test_jcc
value: [0.61538462 0.57142857 0.74285714 0.70588235 0.65789474 0.53658537
0.51282051 0.53658537 0.54545455 0.74285714]
mean value: 0.616775035229313
key: train_jcc
value: [0.62108262 0.64391691 0.64102564 0.64534884 0.64534884 0.62682216
0.64553314 0.65317919 0.66470588 0.66666667]
mean value: 0.6453629888889284
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01025414 0.01117158 0.0117538 0.01153493 0.00975585 0.0098505
0.01094294 0.01119208 0.01107335 0.01145792]
mean value: 0.010898709297180176
key: score_time
value: [0.0131917 0.0166955 0.01715803 0.01844025 0.01854348 0.01284981
0.01602077 0.0126543 0.01319885 0.01370668]
mean value: 0.015245938301086425
key: test_mcc
value: [0.48507125 0.39686563 0.74121539 0.66193182 0.66435774 0.61558566
0.50807424 0.54376443 0.57789674 0.66382036]
mean value: 0.58585832571338
key: train_mcc
value: [0.76139478 0.77064885 0.7650564 0.7766583 0.79148596 0.79647229
0.7797719 0.77411517 0.7678611 0.7935986 ]
mean value: 0.7777063364453793
key: test_accuracy
value: [0.74242424 0.6969697 0.86153846 0.83076923 0.81538462 0.8
0.75384615 0.76923077 0.78461538 0.83076923]
mean value: 0.7885547785547786
key: train_accuracy
value: [0.87713311 0.88054608 0.87393526 0.88245315 0.88926746 0.89267462
0.88415673 0.88245315 0.87734242 0.89097104]
mean value: 0.8830933013936776
key: test_fscore
value: [0.73846154 0.71428571 0.87323944 0.83076923 0.83783784 0.81690141
0.76470588 0.78873239 0.80555556 0.84057971]
mean value: 0.8011068708844366
key: train_fscore
value: [0.88498403 0.88924051 0.88615385 0.89201878 0.89859594 0.9010989
0.89308176 0.89064976 0.8875 0.89937107]
mean value: 0.8922694594792214
key: test_precision
value: [0.75 0.67567568 0.79487179 0.81818182 0.73809524 0.74358974
0.74285714 0.73684211 0.74358974 0.80555556]
mean value: 0.754925881767987
key: train_precision
value: [0.83183183 0.82890855 0.80898876 0.82608696 0.82997118 0.83673469
0.82798834 0.83136095 0.8184438 0.83381924]
mean value: 0.8274134313359605
key: test_recall
value: [0.72727273 0.75757576 0.96875 0.84375 0.96875 0.90625
0.78787879 0.84848485 0.87878788 0.87878788]
mean value: 0.8566287878787879
key: train_recall
value: [0.94539249 0.95904437 0.97959184 0.96938776 0.97959184 0.97619048
0.96928328 0.95904437 0.96928328 0.97610922]
mean value: 0.9682918901348935
key: test_roc_auc
value: [0.74242424 0.6969697 0.86316288 0.83096591 0.81770833 0.80160985
0.75331439 0.76799242 0.78314394 0.83001894]
mean value: 0.7887310606060606
key: train_roc_auc
value: [0.87713311 0.88054608 0.87375496 0.8823048 0.88911332 0.8925321
0.8843015 0.88258341 0.87749878 0.89111583]
mean value: 0.8830883889391934
key: test_jcc
value: [0.58536585 0.55555556 0.775 0.71052632 0.72093023 0.69047619
0.61904762 0.65116279 0.6744186 0.725 ]
mean value: 0.6707483162434352
key: train_jcc
value: [0.79369628 0.8005698 0.79558011 0.80508475 0.81586402 0.82
0.80681818 0.80285714 0.79775281 0.81714286]
mean value: 0.8055365945371218
MCC on Blind test: 0.31
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03306079 0.03242803 0.03047276 0.03302431 0.03214908 0.03250813
0.03191948 0.0324676 0.03009939 0.03266764]
mean value: 0.03207972049713135
key: score_time
value: [0.01489544 0.01417732 0.01466584 0.01532269 0.01460743 0.01509666
0.01482368 0.01405001 0.01482248 0.01481605]
mean value: 0.01472775936126709
key: test_mcc
value: [0.84887469 0.73029674 0.76761091 0.91144345 0.66477003 0.72348485
0.72348485 0.60037879 0.63153153 0.78483448]
mean value: 0.7386710313079483
key: train_mcc
value: [0.80206647 0.8191174 0.83662524 0.83305279 0.85016022 0.82624039
0.83986719 0.82971486 0.80583708 0.85740514]
mean value: 0.8300086779731306
key: test_accuracy
value: [0.92424242 0.86363636 0.87692308 0.95384615 0.83076923 0.86153846
0.86153846 0.8 0.81538462 0.89230769]
mean value: 0.868018648018648
key: train_accuracy
value: [0.90102389 0.90955631 0.91822828 0.9165247 0.92504259 0.91311755
0.91993186 0.91482112 0.90289608 0.92844974]
mean value: 0.9149592129820747
key: test_fscore
value: [0.92537313 0.86956522 0.88571429 0.95081967 0.8358209 0.86153846
0.86153846 0.8 0.82352941 0.89552239]
mean value: 0.8709421927988814
key: train_fscore
value: [0.90136054 0.90940171 0.91919192 0.91680815 0.92567568 0.91311755
0.91965812 0.91408935 0.90322581 0.9295302 ]
mean value: 0.9152059019272198
key: test_precision
value: [0.91176471 0.83333333 0.81578947 1. 0.8 0.84848485
0.875 0.8125 0.8 0.88235294]
mean value: 0.8579225302561216
key: train_precision
value: [0.89830508 0.9109589 0.91 0.91525424 0.91946309 0.91467577
0.92123288 0.92041522 0.89864865 0.91419142]
mean value: 0.9123145250726284
key: test_recall
value: [0.93939394 0.90909091 0.96875 0.90625 0.875 0.875
0.84848485 0.78787879 0.84848485 0.90909091]
mean value: 0.8867424242424242
key: train_recall
value: [0.90443686 0.90784983 0.92857143 0.91836735 0.93197279 0.91156463
0.91808874 0.90784983 0.90784983 0.94539249]
mean value: 0.9181943767267999
key: test_roc_auc
value: [0.92424242 0.86363636 0.87831439 0.953125 0.83143939 0.86174242
0.86174242 0.80018939 0.81486742 0.89204545]
mean value: 0.8681344696969697
key: train_roc_auc
value: [0.90102389 0.90955631 0.91821063 0.91652156 0.92503076 0.9131202
0.91992872 0.91480927 0.90290451 0.92847856]
mean value: 0.9149584407141697
key: test_jcc
value: [0.86111111 0.76923077 0.79487179 0.90625 0.71794872 0.75675676
0.75675676 0.66666667 0.7 0.81081081]
mean value: 0.7740403384153385
key: train_jcc
value: [0.82043344 0.8338558 0.85046729 0.84639498 0.86163522 0.84012539
0.85126582 0.84177215 0.82352941 0.86833856]
mean value: 0.843781806636849
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.36617112 2.41984844 2.40280962 2.34540963 2.35769773 2.4068315
2.38700414 2.36304808 2.38966513 2.34983063]
mean value: 2.3788316011428834
key: score_time
value: [0.01269841 0.01472521 0.01361251 0.01399899 0.01547694 0.01341414
0.01428437 0.01328421 0.01366544 0.02118802]
mean value: 0.014634823799133301
key: test_mcc
value: [0.91287093 0.91287093 0.94028478 1. 0.88382395 0.91168461
1. 0.85599665 0.88340557 0.94017476]
mean value: 0.9241112183583176
key: train_mcc
value: [0.99659284 0.99659284 0.9965986 0.9965986 0.9965986 0.9965986
0.99659864 0.99659864 1. 0.99659864]
mean value: 0.996937598865393
key: test_accuracy
value: [0.95454545 0.95454545 0.96923077 1. 0.93846154 0.95384615
1. 0.92307692 0.93846154 0.96923077]
mean value: 0.9601398601398602
key: train_accuracy
value: [0.99829352 0.99829352 0.99829642 0.99829642 0.99829642 0.99829642
0.99829642 0.99829642 1. 0.99829642]
mean value: 0.9984661988127286
key: test_fscore
value: [0.95652174 0.95652174 0.96969697 1. 0.94117647 0.95522388
1. 0.92957746 0.94285714 0.97058824]
mean value: 0.9622163642083083
key: train_fscore
value: [0.99829642 0.99829642 0.99830221 0.99830221 0.99830221 0.99830221
0.99829642 0.99829642 1. 0.99829642]
mean value: 0.9984690940959036
key: test_precision
value: [0.91666667 0.91666667 0.94117647 1. 0.88888889 0.91428571
1. 0.86842105 0.89189189 0.94285714]
mean value: 0.9280854494476786
key: train_precision
value: [0.99659864 0.99659864 0.99661017 0.99661017 0.99661017 0.99661017
0.99659864 0.99659864 1. 0.99659864]
mean value: 0.9969433875245013
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.95454545 0.96969697 1. 0.93939394 0.95454545
1. 0.921875 0.9375 0.96875 ]
mean value: 0.9600852272727273
key: train_roc_auc
value: [0.99829352 0.99829352 0.99829352 0.99829352 0.99829352 0.99829352
0.99829932 0.99829932 1. 0.99829932]
mean value: 0.9984659051333844
key: test_jcc
value: [0.91666667 0.91666667 0.94117647 1. 0.88888889 0.91428571
1. 0.86842105 0.89189189 0.94285714]
mean value: 0.9280854494476786
key: train_jcc
value: [0.99659864 0.99659864 0.99661017 0.99661017 0.99661017 0.99661017
0.99659864 0.99659864 1. 0.99659864]
mean value: 0.9969433875245013
MCC on Blind test: 0.75
Accuracy on Blind test: 0.91
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03670931 0.02427554 0.02596927 0.0255847 0.02775073 0.02558708
0.02727795 0.0245955 0.02489996 0.02757144]
mean value: 0.027022147178649904
key: score_time
value: [0.01371646 0.00959873 0.00935125 0.00904441 0.00926328 0.00973654
0.00950623 0.00951982 0.00918865 0.00926661]
mean value: 0.00981919765472412
key: test_mcc
value: [0.94112395 0.9701425 0.96969697 0.96969697 0.96969697 0.94028478
1. 0.96966868 0.91144345 0.94017476]
mean value: 0.9581929030038463
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 0.98484848 0.98461538 0.98461538 0.98461538 0.96923077
1. 0.98461538 0.95384615 0.96923077]
mean value: 0.9785314685314686
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 0.98507463 0.98461538 0.98461538 0.98461538 0.96969697
1. 0.98507463 0.95652174 0.97058824]
mean value: 0.9791390586993137
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 0.97058824 0.96969697 0.96969697 0.96969697 0.94117647
1. 0.97058824 0.91666667 0.94285714]
mean value: 0.9593824802648332
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 0.98484848 0.98484848 0.98484848 0.98484848 0.96969697
1. 0.984375 0.953125 0.96875 ]
mean value: 0.9785037878787879
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 0.97058824 0.96969697 0.96969697 0.96969697 0.94117647
1. 0.97058824 0.91666667 0.94285714]
mean value: 0.9593824802648332
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12954378 0.128371 0.13171649 0.13734126 0.13280702 0.13321447
0.12788749 0.12463093 0.12696791 0.12705946]
mean value: 0.12995398044586182
key: score_time
value: [0.01783633 0.01957083 0.01873755 0.02003932 0.01885629 0.01862621
0.01799321 0.01798415 0.01792145 0.01805067]
mean value: 0.018561601638793945
key: test_mcc
value: [0.94112395 1. 0.94028478 1. 0.90814394 0.96969697
0.96966868 0.96966868 0.91144345 1. ]
mean value: 0.9610030456147299
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 1. 0.96923077 1. 0.95384615 0.98461538
0.98461538 0.98461538 0.95384615 1. ]
mean value: 0.98004662004662
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 1. 0.96969697 1. 0.95384615 0.98461538
0.98507463 0.98507463 0.95652174 1. ]
mean value: 0.9805417736314403
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 1. 0.94117647 1. 0.93939394 0.96969697
0.97058824 0.97058824 0.91666667 1. ]
mean value: 0.965096765979119
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96875 1. 1. 1. 1.
1. ]
mean value: 0.996875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 1. 0.96969697 1. 0.95407197 0.98484848
0.984375 0.984375 0.953125 1. ]
mean value: 0.9800189393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 1. 0.94117647 1. 0.91176471 0.96969697
0.97058824 0.97058824 0.91666667 1. ]
mean value: 0.9623338426279603
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01057839 0.01049995 0.01057529 0.01068735 0.01063061 0.01047444
0.01177788 0.01060271 0.0109849 0.01069641]
mean value: 0.010750794410705566
key: score_time
value: [0.00883055 0.00891185 0.00885177 0.00892806 0.00886106 0.00881624
0.00884795 0.00892973 0.00899482 0.0088551 ]
mean value: 0.008882713317871094
key: test_mcc
value: [0.80622577 0.78086881 0.68030134 0.91168461 0.68964536 0.88382395
0.80282704 0.88340557 0.75148662 0.91144345]
mean value: 0.8101712532464146
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89393939 0.87878788 0.81538462 0.95384615 0.83076923 0.93846154
0.89230769 0.93846154 0.86153846 0.95384615]
mean value: 0.8957342657342657
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90410959 0.89189189 0.84210526 0.95522388 0.84931507 0.94117647
0.90410959 0.94285714 0.88 0.95652174]
mean value: 0.9067310634797957
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.825 0.80487805 0.72727273 0.91428571 0.75609756 0.88888889
0.825 0.89189189 0.78571429 0.91666667]
mean value: 0.8335695784476272
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96875 1. 1. 1. 1.
1. ]
mean value: 0.996875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89393939 0.87878788 0.81818182 0.95454545 0.83285985 0.93939394
0.890625 0.9375 0.859375 0.953125 ]
mean value: 0.8958333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.825 0.80487805 0.72727273 0.91428571 0.73809524 0.88888889
0.825 0.89189189 0.78571429 0.91666667]
mean value: 0.8317693461595901
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.82
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.83020997 1.8339448 1.83662057 1.85287166 1.85693479 1.83377242
1.83173656 1.83148408 1.90730047 1.97430134]
mean value: 1.8589176654815673
key: score_time
value: [0.09384489 0.09449744 0.09404278 0.09543896 0.09338045 0.09436655
0.09370232 0.09369016 0.10243464 0.10232711]
mean value: 0.09577252864837646
key: test_mcc
value: [0.94112395 0.9701425 0.96969697 1. 1. 0.96969697
0.96966868 0.96966868 0.91144345 0.96966868]
mean value: 0.9671109886051606
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 0.98484848 0.98461538 1. 1. 0.98461538
0.98461538 0.98461538 0.95384615 0.98461538]
mean value: 0.9831468531468532
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 0.98507463 0.98461538 1. 1. 0.98461538
0.98507463 0.98507463 0.95652174 0.98507463]
mean value: 0.9836639251118008
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 0.97058824 0.96969697 1. 1. 0.96969697
0.97058824 0.97058824 0.91666667 0.97058824]
mean value: 0.9681270690094219
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 0.98484848 0.98484848 1. 1. 0.98484848
0.984375 0.984375 0.953125 0.984375 ]
mean value: 0.9830492424242424
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 0.97058824 0.96969697 1. 1. 0.96969697
0.97058824 0.97058824 0.91666667 0.97058824]
mean value: 0.9681270690094219
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.96820188 1.01171207 1.01190472 1.01802468 1.00538397 0.99444675
1.08758998 1.04639006 1.02571559 1.07481289]
mean value: 1.024418258666992
key: score_time
value: [0.24577761 0.13931179 0.27650094 0.25720572 0.26955438 0.25722146
0.28553748 0.26408482 0.23240495 0.24349713]
mean value: 0.24710962772369385
key: test_mcc
value: [0.94112395 0.9701425 0.96969697 0.96969697 0.96969697 0.96969697
1. 0.96966868 0.84644588 0.87867338]
mean value: 0.948484226747499
key: train_mcc
value: [0.9830783 0.9763879 0.97976106 0.97642665 0.97642665 0.98310636
0.97642854 0.97976246 0.98301582 0.96601886]
mean value: 0.9780412605604475
key: test_accuracy
value: [0.96969697 0.98484848 0.98461538 0.98461538 0.98461538 0.98461538
1. 0.98461538 0.92307692 0.93846154]
mean value: 0.973916083916084
key: train_accuracy
value: [0.99146758 0.98805461 0.98977853 0.98807496 0.98807496 0.99148211
0.98807496 0.98977853 0.99148211 0.98296422]
mean value: 0.9889232576123169
key: test_fscore
value: [0.97058824 0.98507463 0.98461538 0.98461538 0.98461538 0.98461538
1. 0.98507463 0.92537313 0.9375 ]
mean value: 0.9742072161815358
key: train_fscore
value: [0.99153976 0.98819562 0.98989899 0.98823529 0.98823529 0.9915683
0.98819562 0.98986486 0.99151104 0.98305085]
mean value: 0.9890295617048415
key: test_precision
value: [0.94285714 0.97058824 0.96969697 0.96969697 0.96969697 0.96969697
1. 0.97058824 0.91176471 0.96774194]
mean value: 0.964232813359948
key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.98322148 0.97666667 0.98 0.97674419 0.97674419 0.98327759
0.97666667 0.97993311 0.98648649 0.97643098]
mean value: 0.9796171347195024
key: test_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.93939394 0.90909091]
mean value: 0.9848484848484849
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99658703 0.98976109]
mean value: 0.9986348122866894
key: test_roc_auc
value: [0.96969697 0.98484848 0.98484848 0.98484848 0.98484848 0.98484848
1. 0.984375 0.92282197 0.93892045]
mean value: 0.9740056818181818
key: train_roc_auc
value: [0.99146758 0.98805461 0.98976109 0.98805461 0.98805461 0.99146758
0.98809524 0.98979592 0.99149079 0.98297578]
mean value: 0.988921780316222
key: test_jcc
value: [0.94285714 0.97058824 0.96969697 0.96969697 0.96969697 0.96969697
1. 0.97058824 0.86111111 0.88235294]
mean value: 0.9506285544520838
key: train_jcc
value: [0.98322148 0.97666667 0.98 0.97674419 0.97674419 0.98327759
0.97666667 0.97993311 0.98316498 0.96666667]
mean value: 0.978308553410921
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02450657 0.01054621 0.0114634 0.01047778 0.01044083 0.01177239
0.01055622 0.01051903 0.01049495 0.01219797]
mean value: 0.012297534942626953
key: score_time
value: [0.00973797 0.00886369 0.00927877 0.00881004 0.00888014 0.0099175
0.00885415 0.00883937 0.00888014 0.0097239 ]
mean value: 0.009178566932678222
key: test_mcc
value: [0.54772256 0.45454545 0.72572613 0.70352647 0.60000027 0.41516606
0.42714107 0.41785545 0.38461695 0.73234704]
mean value: 0.5408647457818198
key: train_mcc
value: [0.54699656 0.59399689 0.57120468 0.58602719 0.58602719 0.56692716
0.58164247 0.59178072 0.61296324 0.60857864]
mean value: 0.5846144748987409
key: test_accuracy
value: [0.77272727 0.72727273 0.86153846 0.84615385 0.8 0.70769231
0.70769231 0.70769231 0.69230769 0.86153846]
mean value: 0.7684615384615385
key: train_accuracy
value: [0.77303754 0.79522184 0.78534923 0.79216354 0.79216354 0.78194208
0.79045997 0.7955707 0.80579216 0.80408859]
mean value: 0.7915789198272002
key: test_fscore
value: [0.76190476 0.72727273 0.85245902 0.82758621 0.79365079 0.6984127
0.6779661 0.6984127 0.70588235 0.85245902]
mean value: 0.7596006373973208
key: train_fscore
value: [0.76625659 0.7833935 0.78125 0.7844523 0.7844523 0.77060932
0.78458844 0.79020979 0.79858657 0.8 ]
mean value: 0.7843798808929663
key: test_precision
value: [0.8 0.72727273 0.89655172 0.92307692 0.80645161 0.70967742
0.76923077 0.73333333 0.68571429 0.92857143]
mean value: 0.7979880223595462
key: train_precision
value: [0.78985507 0.83141762 0.79787234 0.81617647 0.81617647 0.81439394
0.8057554 0.81003584 0.82783883 0.81560284]
mean value: 0.8125124820676404
key: test_recall
value: [0.72727273 0.72727273 0.8125 0.75 0.78125 0.6875
0.60606061 0.66666667 0.72727273 0.78787879]
mean value: 0.7273674242424243
key: train_recall
value: [0.7440273 0.74061433 0.76530612 0.75510204 0.75510204 0.73129252
0.76450512 0.77133106 0.77133106 0.78498294]
mean value: 0.7583594529962155
key: test_roc_auc
value: [0.77272727 0.72727273 0.86079545 0.84469697 0.79971591 0.70738636
0.7092803 0.70833333 0.69176136 0.86268939]
mean value: 0.7684659090909091
key: train_roc_auc
value: [0.77303754 0.79522184 0.78538344 0.79222679 0.79222679 0.78202851
0.79041583 0.79552947 0.80573356 0.80405609]
mean value: 0.7915859859302082
key: test_jcc
value: [0.61538462 0.57142857 0.74285714 0.70588235 0.65789474 0.53658537
0.51282051 0.53658537 0.54545455 0.74285714]
mean value: 0.616775035229313
key: train_jcc
value: [0.62108262 0.64391691 0.64102564 0.64534884 0.64534884 0.62682216
0.64553314 0.65317919 0.66470588 0.66666667]
mean value: 0.6453629888889284
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10872388 0.07221985 0.08031511 0.07965851 0.07820702 0.0832839
0.0768342 0.07378078 0.08042312 0.08786035]
mean value: 0.08213067054748535
key: score_time
value: [0.0114131 0.01115036 0.01119137 0.01129603 0.0114994 0.01149344
0.01123643 0.01188612 0.01131845 0.01106858]
mean value: 0.011355328559875488
key: test_mcc
value: [0.94112395 0.9701425 0.96969697 0.96969697 1. 0.96969697
0.96966868 0.96966868 0.94017476 0.96966868]
mean value: 0.9669538159281116
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 0.98484848 0.98461538 0.98461538 1. 0.98461538
0.98461538 0.98461538 0.96923077 0.98461538]
mean value: 0.9831468531468532
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 0.98507463 0.98461538 0.98461538 1. 0.98461538
0.98507463 0.98507463 0.97058824 0.98507463]
mean value: 0.9835321131897076
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 0.97058824 0.96969697 0.96969697 1. 0.96969697
0.97058824 0.97058824 0.94285714 0.97058824]
mean value: 0.9677158135981665
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 0.98484848 0.98484848 0.98484848 1. 0.98484848
0.984375 0.984375 0.96875 0.984375 ]
mean value: 0.983096590909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 0.97058824 0.96969697 0.96969697 1. 0.96969697
0.97058824 0.97058824 0.94285714 0.97058824]
mean value: 0.9677158135981665
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04492283 0.06440902 0.07062221 0.09072375 0.05423832 0.0812223
0.11205435 0.06770706 0.04995537 0.08655524]
mean value: 0.07224104404449463
key: score_time
value: [0.01941609 0.01222134 0.01941872 0.01229501 0.01221132 0.01934242
0.03134274 0.01228213 0.01945925 0.0196538 ]
mean value: 0.0177642822265625
key: test_mcc
value: [0.90950859 0.8196886 0.83005736 0.93844697 0.82191818 0.94028478
0.93844697 0.91144345 0.96966868 0.90814394]
mean value: 0.8987607527266495
key: train_mcc
value: [0.95221843 0.9488552 0.95232125 0.95238704 0.95913582 0.95232125
0.95238925 0.95232236 0.95571266 0.95584685]
mean value: 0.9533510091791351
key: test_accuracy
value: [0.95454545 0.90909091 0.90769231 0.96923077 0.90769231 0.96923077
0.96923077 0.95384615 0.98461538 0.95384615]
mean value: 0.9479020979020979
key: train_accuracy
value: [0.97610922 0.97440273 0.97614991 0.97614991 0.97955707 0.97614991
0.97614991 0.97614991 0.97785349 0.97785349]
mean value: 0.9766525574012593
key: test_fscore
value: [0.95522388 0.91176471 0.91428571 0.96875 0.91176471 0.96969697
0.96969697 0.95652174 0.98507463 0.95384615]
mean value: 0.9496625465883635
key: train_fscore
value: [0.97610922 0.97453311 0.97627119 0.97635135 0.97966102 0.97627119
0.97627119 0.97619048 0.97785349 0.97800338]
mean value: 0.9767515602219685
key: test_precision
value: [0.94117647 0.88571429 0.84210526 0.96875 0.86111111 0.94117647
0.96969697 0.91666667 0.97058824 0.96875 ]
mean value: 0.9265735472817516
key: train_precision
value: [0.97610922 0.96959459 0.97297297 0.96979866 0.97635135 0.97297297
0.96969697 0.97288136 0.97619048 0.96979866]
mean value: 0.9726367224164848
key: test_recall
value: [0.96969697 0.93939394 1. 0.96875 0.96875 1.
0.96969697 1. 1. 0.93939394]
mean value: 0.9755681818181818
key: train_recall
value: [0.97610922 0.97952218 0.97959184 0.9829932 0.9829932 0.97959184
0.98293515 0.97952218 0.97952218 0.98634812]
mean value: 0.9809129112395811
key: test_roc_auc
value: [0.95454545 0.90909091 0.90909091 0.96922348 0.90861742 0.96969697
0.96922348 0.953125 0.984375 0.95407197]
mean value: 0.9481060606060606
key: train_roc_auc
value: [0.97610922 0.97440273 0.97614404 0.97613824 0.97955121 0.97614404
0.97616145 0.97615565 0.97785633 0.97786794]
mean value: 0.9766530844419679
key: test_jcc
value: [0.91428571 0.83783784 0.84210526 0.93939394 0.83783784 0.94117647
0.94117647 0.91666667 0.97058824 0.91176471]
mean value: 0.9052833141532832
key: train_jcc
value: [0.95333333 0.95033113 0.95364238 0.95379538 0.96013289 0.95364238
0.95364238 0.95348837 0.95666667 0.95695364]
mean value: 0.9545628562526227
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01583385 0.01057649 0.01054573 0.01007676 0.0101521 0.0101788
0.01006889 0.01036167 0.01008558 0.01018763]
mean value: 0.010806751251220704
key: score_time
value: [0.01199889 0.00923753 0.00887418 0.00877666 0.00880337 0.00871682
0.00870228 0.00878859 0.00886726 0.00878167]
mean value: 0.009154725074768066
key: test_mcc
value: [0.5768179 0.55815631 0.56211492 0.72572613 0.4983497 0.60037879
0.51782513 0.3844697 0.50807424 0.69326017]
mean value: 0.5625172996514779
key: train_mcc
value: [0.57434639 0.60444771 0.59275601 0.61358451 0.61163099 0.61928315
0.59939415 0.61305297 0.59584164 0.61988238]
mean value: 0.6044219886260209
key: test_accuracy
value: [0.78787879 0.77272727 0.76923077 0.86153846 0.73846154 0.8
0.75384615 0.69230769 0.75384615 0.84615385]
mean value: 0.7775990675990676
key: train_accuracy
value: [0.78668942 0.80204778 0.7955707 0.80579216 0.80579216 0.80919932
0.79897785 0.80579216 0.79727428 0.80919932]
mean value: 0.8016335157072172
key: test_fscore
value: [0.79411765 0.79452055 0.79452055 0.85245902 0.76712329 0.8
0.73333333 0.6969697 0.76470588 0.85294118]
mean value: 0.785069113614047
key: train_fscore
value: [0.79270315 0.80536913 0.80327869 0.81372549 0.80743243 0.81456954
0.80528053 0.81188119 0.80330579 0.81518152]
mean value: 0.8072727445453226
key: test_precision
value: [0.77142857 0.725 0.70731707 0.89655172 0.68292683 0.78787879
0.81481481 0.6969697 0.74285714 0.82857143]
mean value: 0.7654316069097398
key: train_precision
value: [0.77096774 0.79207921 0.77531646 0.78301887 0.80201342 0.79354839
0.77955272 0.78594249 0.77884615 0.78913738]
mean value: 0.7850422825098152
key: test_recall
value: [0.81818182 0.87878788 0.90625 0.8125 0.875 0.8125
0.66666667 0.6969697 0.78787879 0.87878788]
mean value: 0.8133522727272727
key: train_recall
value: [0.81569966 0.81911263 0.83333333 0.84693878 0.81292517 0.83673469
0.83276451 0.83959044 0.82935154 0.84300341]
mean value: 0.8309454157089458
key: test_roc_auc
value: [0.78787879 0.77272727 0.77130682 0.86079545 0.7405303 0.80018939
0.75520833 0.69223485 0.75331439 0.84564394]
mean value: 0.7779829545454546
key: train_roc_auc
value: [0.78668942 0.80204778 0.79550626 0.80572195 0.80577999 0.80915233
0.79903531 0.80584964 0.79732883 0.80925681]
mean value: 0.8016368322072857
key: test_jcc
value: [0.65853659 0.65909091 0.65909091 0.74285714 0.62222222 0.66666667
0.57894737 0.53488372 0.61904762 0.74358974]
mean value: 0.6484932887282351
key: train_jcc
value: [0.65659341 0.6741573 0.67123288 0.68595041 0.67705382 0.68715084
0.67403315 0.68333333 0.67127072 0.68802228]
mean value: 0.6768798147110306
MCC on Blind test: 0.74
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02534986 0.02770162 0.0270679 0.02558327 0.02563214 0.0259738
0.03067756 0.02564144 0.03274512 0.02606511]
mean value: 0.027243781089782714
key: score_time
value: [0.01025176 0.01149607 0.01207352 0.01202536 0.01203775 0.01206684
0.01200747 0.01207829 0.0120697 0.01209188]
mean value: 0.011819863319396972
key: test_mcc
value: [0.90950859 0.84887469 0.88382395 0.90814394 0.84995597 0.94028478
0.94028478 0.82191818 0.91144345 0.91168461]
mean value: 0.8925922950912009
key: train_mcc
value: [0.96596033 0.94559077 0.93552737 0.96252656 0.95913582 0.95920405
0.92935637 0.91635122 0.95920405 0.95913582]
mean value: 0.9491992348323932
key: test_accuracy
value: [0.95454545 0.92424242 0.93846154 0.95384615 0.92307692 0.96923077
0.96923077 0.90769231 0.95384615 0.95384615]
mean value: 0.9448018648018648
key: train_accuracy
value: [0.98293515 0.97269625 0.96763203 0.98126065 0.97955707 0.97955707
0.96422487 0.95741056 0.97955707 0.97955707]
mean value: 0.9744387787733079
key: test_fscore
value: [0.95384615 0.92307692 0.94117647 0.95384615 0.92537313 0.96969697
0.96875 0.90322581 0.95652174 0.95238095]
mean value: 0.9447894303345794
key: train_fscore
value: [0.98281787 0.97241379 0.96806723 0.98132428 0.97966102 0.97945205
0.96335079 0.95606327 0.97966102 0.97945205]
mean value: 0.9742263365568498
key: test_precision
value: [0.96875 0.9375 0.88888889 0.93939394 0.88571429 0.94117647
1. 0.96551724 0.91666667 1. ]
mean value: 0.9443607492631326
key: train_precision
value: [0.98961938 0.9825784 0.95681063 0.97966102 0.97635135 0.9862069
0.98571429 0.98550725 0.97306397 0.98281787]
mean value: 0.9798331045027515
key: test_recall
value: [0.93939394 0.90909091 1. 0.96875 0.96875 1.
0.93939394 0.84848485 1. 0.90909091]
mean value: 0.9482954545454545
key: train_recall
value: [0.97610922 0.96245734 0.97959184 0.9829932 0.9829932 0.97278912
0.94197952 0.92832765 0.98634812 0.97610922]
mean value: 0.9689698404959253
key: test_roc_auc
value: [0.95454545 0.92424242 0.93939394 0.95407197 0.92376894 0.96969697
0.96969697 0.90861742 0.953125 0.95454545]
mean value: 0.9451704545454546
key: train_roc_auc
value: [0.98293515 0.97269625 0.96761162 0.98125769 0.97955121 0.97956862
0.96418704 0.9573611 0.97956862 0.97955121]
mean value: 0.9744288500383088
key: test_jcc
value: [0.91176471 0.85714286 0.88888889 0.91176471 0.86111111 0.94117647
0.93939394 0.82352941 0.91666667 0.90909091]
mean value: 0.8960529666412019
key: train_jcc
value: [0.96621622 0.94630872 0.93811075 0.96333333 0.96013289 0.95973154
0.92929293 0.91582492 0.96013289 0.95973154]
mean value: 0.9498815736664497
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01519728 0.02194762 0.02023983 0.02044773 0.02418399 0.02258992
0.02171874 0.02219558 0.01945853 0.02303457]
mean value: 0.02110137939453125
key: score_time
value: [0.01065183 0.01201725 0.01206088 0.01222491 0.01207542 0.01208949
0.01208282 0.01207709 0.01206946 0.0120399 ]
mean value: 0.011938905715942383
key: test_mcc
value: [0.90950859 0.90950859 0.56270396 0.84644588 0.93844697 0.91168461
0.63287203 0.90814394 0.84644588 0.91168461]
mean value: 0.8377445066997006
key: train_mcc
value: [0.91913857 0.9217648 0.62108455 0.94212842 0.96265981 0.97274268
0.72617124 0.90990542 0.95238925 0.96592835]
mean value: 0.889391309620473
key: test_accuracy
value: [0.95454545 0.95454545 0.73846154 0.92307692 0.96923077 0.95384615
0.78461538 0.95384615 0.92307692 0.95384615]
mean value: 0.9109090909090909
key: train_accuracy
value: [0.95904437 0.96075085 0.77853492 0.97103918 0.98126065 0.98637138
0.85008518 0.95400341 0.97614991 0.98296422]
mean value: 0.940020408044607
key: test_fscore
value: [0.95522388 0.95522388 0.79012346 0.92063492 0.96875 0.95522388
0.73076923 0.95384615 0.92537313 0.95238095]
mean value: 0.9107549490540785
key: train_fscore
value: [0.96 0.96121417 0.8189415 0.97094017 0.98145025 0.98639456
0.82677165 0.95238095 0.97627119 0.98293515]
mean value: 0.9417299597102607
key: test_precision
value: [0.94117647 0.94117647 0.65306122 0.93548387 0.96875 0.91428571
1. 0.96875 0.91176471 1. ]
mean value: 0.9234448456802076
key: train_precision
value: [0.93811075 0.95 0.69339623 0.97594502 0.97324415 0.98639456
0.97674419 0.98540146 0.96969697 0.98293515]
mean value: 0.9431868466944326
key: test_recall
value: [0.96969697 0.96969697 1. 0.90625 0.96875 1.
0.57575758 0.93939394 0.93939394 0.90909091]
mean value: 0.9178030303030303
key: train_recall
value: [0.98293515 0.97269625 1. 0.96598639 0.98979592 0.98639456
0.71672355 0.92150171 0.98293515 0.98293515]
mean value: 0.9501903833205637
key: test_roc_auc
value: [0.95454545 0.95454545 0.74242424 0.92282197 0.96922348 0.95454545
0.78787879 0.95407197 0.92282197 0.95454545]
mean value: 0.9117424242424242
key: train_roc_auc
value: [0.95904437 0.96075085 0.778157 0.9710478 0.98124608 0.98637134
0.84985837 0.95394813 0.97616145 0.98296418]
mean value: 0.939954958092452
key: test_jcc
value: [0.91428571 0.91428571 0.65306122 0.85294118 0.93939394 0.91428571
0.57575758 0.91176471 0.86111111 0.90909091]
mean value: 0.8445977785053416
key: train_jcc
value: [0.92307692 0.92532468 0.69339623 0.94352159 0.96357616 0.97315436
0.70469799 0.90909091 0.95364238 0.96644295]
mean value: 0.8955924173651768
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.19408345 0.18791676 0.19393516 0.1892693 0.18968463 0.1914618
0.19001937 0.18625164 0.19071054 0.19217348]
mean value: 0.1905506134033203
key: score_time
value: [0.01648283 0.01540661 0.01589441 0.01590419 0.01571655 0.01537204
0.01550341 0.01552963 0.0156188 0.01553798]
mean value: 0.01569664478302002
key: test_mcc
value: [0.9701425 0.9701425 0.96969697 0.96969697 1. 0.96969697
1. 0.96966868 0.94017476 0.96966868]
mean value: 0.9728888029265946
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98484848 0.98484848 0.98461538 0.98461538 1. 0.98461538
1. 0.98461538 0.96923077 0.98461538]
mean value: 0.9862004662004662
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98507463 0.98507463 0.98461538 0.98461538 1. 0.98461538
1. 0.98507463 0.97058824 0.98507463]
mean value: 0.9864732896602958
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.97058824 0.97058824 0.96969697 0.96969697 1. 0.96969697
1. 0.97058824 0.94285714 0.97058824]
mean value: 0.9734300993124523
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.98484848 0.98484848 0.98484848 1. 0.98484848
1. 0.984375 0.96875 0.984375 ]
mean value: 0.9861742424242425
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.97058824 0.97058824 0.96969697 0.96969697 1. 0.96969697
1. 0.97058824 0.94285714 0.97058824]
mean value: 0.9734300993124523
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.07466435 0.08926511 0.08730602 0.09664154 0.0819788 0.07781601
0.09088039 0.08559895 0.08477879 0.07983494]
mean value: 0.08487648963928222
key: score_time
value: [0.0218389 0.02159381 0.01945972 0.03571534 0.01717114 0.03562188
0.03001285 0.03977084 0.03206062 0.0384841 ]
mean value: 0.0291729211807251
key: test_mcc
value: [0.90950859 1. 0.96969697 0.96969697 1. 0.96969697
1. 0.96966868 0.91144345 0.94017476]
mean value: 0.9639886393074825
key: train_mcc
value: [1. 1. 0.99320865 0.9965986 1. 0.9965986
0.99659864 0.99320881 0.99659864 0.98983039]
mean value: 0.9962642330617676
key: test_accuracy
value: [0.95454545 1. 0.98461538 0.98461538 1. 0.98461538
1. 0.98461538 0.95384615 0.96923077]
mean value: 0.9816083916083916
key: train_accuracy
value: [1. 1. 0.99659284 0.99829642 1. 0.99829642
0.99829642 0.99659284 0.99829642 0.99488927]
mean value: 0.9981260647359455
key: test_fscore
value: [0.95522388 1. 0.98461538 0.98461538 1. 0.98461538
1. 0.98507463 0.95652174 0.97058824]
mean value: 0.9821254635733393
key: train_fscore
value: [1. 1. 0.99661017 0.99830221 1. 0.99830221
0.99829642 0.99659864 0.99829642 0.99490662]
mean value: 0.9981312689575405
key: test_precision
value: [0.94117647 1. 0.96969697 0.96969697 1. 0.96969697
1. 0.97058824 0.91666667 0.94285714]
mean value: 0.9680379424497072
key: train_precision
value: [1. 1. 0.99324324 0.99661017 1. 0.99661017
0.99659864 0.99322034 0.99659864 0.98986486]
mean value: 0.9962746064985775
key: test_recall
value: [0.96969697 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
0.996969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 1. 0.98484848 0.98484848 1. 0.98484848
1. 0.984375 0.953125 0.96875 ]
mean value: 0.9815340909090909
key: train_roc_auc
value: [1. 1. 0.99658703 0.99829352 1. 0.99829352
0.99829932 0.99659864 0.99829932 0.99489796]
mean value: 0.9981269299528686
key: test_jcc
value: [0.91428571 1. 0.96969697 0.96969697 1. 0.96969697
1. 0.97058824 0.91666667 0.94285714]
mean value: 0.9653488668194551
key: train_jcc
value: [1. 1. 0.99324324 0.99661017 1. 0.99661017
0.99659864 0.99322034 0.99659864 0.98986486]
mean value: 0.9962746064985775
MCC on Blind test: 0.82
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.26663494 0.18799305 0.22161603 0.20603728 0.25028539 0.19107294
0.25472808 0.27957559 0.28712082 0.25773311]
mean value: 0.2402797222137451
key: score_time
value: [0.01788664 0.03220916 0.01656365 0.02937508 0.02889848 0.01650047
0.02786303 0.02866268 0.03101921 0.02904153]
mean value: 0.025801992416381835
key: test_mcc
value: [0.88531564 0.78086881 0.85663571 0.96969697 0.84995597 0.85663571
0.91144345 0.88340557 0.77695466 0.88340557]
mean value: 0.8654318051752643
key: train_mcc
value: [0.9830783 0.98981298 0.98646265 0.98983004 0.98983004 0.99320865
0.98983039 0.98646327 0.98310733 0.98646327]
mean value: 0.9878086916994873
key: test_accuracy
value: [0.93939394 0.87878788 0.92307692 0.98461538 0.92307692 0.92307692
0.95384615 0.93846154 0.87692308 0.93846154]
mean value: 0.927972027972028
key: train_accuracy
value: [0.99146758 0.99488055 0.99318569 0.99488927 0.99488927 0.99659284
0.99488927 0.99318569 0.99148211 0.99318569]
mean value: 0.9938647952509143
key: test_fscore
value: [0.94285714 0.89189189 0.92753623 0.98461538 0.92537313 0.92753623
0.95652174 0.94285714 0.89189189 0.94285714]
mean value: 0.9333937934197506
key: train_fscore
value: [0.99153976 0.99490662 0.99324324 0.99492386 0.99492386 0.99661017
0.99490662 0.99322034 0.99153976 0.99322034]
mean value: 0.9939034575448025
key: test_precision
value: [0.89189189 0.80487805 0.86486486 0.96969697 0.88571429 0.86486486
0.91666667 0.89189189 0.80487805 0.89189189]
mean value: 0.8787239425044303
key: train_precision
value: [0.98322148 0.98986486 0.98657718 0.98989899 0.98989899 0.99324324
0.98986486 0.98653199 0.98322148 0.98653199]
mean value: 0.9878855060063114
key: test_recall
value: [1. 1. 1. 1. 0.96875 1. 1. 1. 1.
1. ]
mean value: 0.996875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93939394 0.87878788 0.92424242 0.98484848 0.92376894 0.92424242
0.953125 0.9375 0.875 0.9375 ]
mean value: 0.9278409090909091
key: train_roc_auc
value: [0.99146758 0.99488055 0.99317406 0.99488055 0.99488055 0.99658703
0.99489796 0.99319728 0.9914966 0.99319728]
mean value: 0.9938659422813494
key: test_jcc
value: [0.89189189 0.80487805 0.86486486 0.96969697 0.86111111 0.86486486
0.91666667 0.89189189 0.80487805 0.89189189]
mean value: 0.8762636250441129
key: train_jcc
value: [0.98322148 0.98986486 0.98657718 0.98989899 0.98989899 0.99324324
0.98986486 0.98653199 0.98322148 0.98653199]
mean value: 0.9878855060063114
MCC on Blind test: 0.64
Accuracy on Blind test: 0.88
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.76841474 0.74795699 0.74724698 0.76810884 0.7429924 0.75336194
0.76922297 0.75542974 0.76049519 0.74811101]
mean value: 0.7561340808868409
key: score_time
value: [0.00952291 0.00938296 0.00974011 0.00956559 0.0093658 0.00958729
0.01015449 0.00941133 0.00982475 0.00931835]
mean value: 0.009587359428405762
key: test_mcc
value: [0.94112395 0.9701425 0.96969697 0.96969697 1. 0.96969697
0.96966868 0.96966868 0.91144345 0.94017476]
mean value: 0.9611312929494409
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96969697 0.98484848 0.98461538 0.98461538 1. 0.98461538
0.98461538 0.98461538 0.95384615 0.96923077]
mean value: 0.9800699300699302
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97058824 0.98507463 0.98461538 0.98461538 1. 0.98461538
0.98507463 0.98507463 0.95652174 0.97058824]
mean value: 0.9806768244161839
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94285714 0.97058824 0.96969697 0.96969697 1. 0.96969697
0.97058824 0.97058824 0.91666667 0.94285714]
mean value: 0.9623236567354214
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96969697 0.98484848 0.98484848 0.98484848 1. 0.98484848
0.984375 0.984375 0.953125 0.96875 ]
mean value: 0.9799715909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94285714 0.97058824 0.96969697 0.96969697 1. 0.96969697
0.97058824 0.97058824 0.91666667 0.94285714]
mean value: 0.9623236567354214
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.0343082 0.03298187 0.03278327 0.03844833 0.03307199 0.03226948
0.03353143 0.03394914 0.03774524 0.04404974]
mean value: 0.035313868522644044
key: score_time
value: [0.01221323 0.01314235 0.01495123 0.01498508 0.01509428 0.0152576
0.03477526 0.01318979 0.01854706 0.02858615]
mean value: 0.01807420253753662
key: test_mcc
value: [0.9701425 0.90950859 1. 1. 0.90814394 0.96969697
0.96966868 0.96966868 1. 1. ]
mean value: 0.9696829367214851
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98484848 0.95454545 1. 1. 0.95384615 0.98461538
0.98461538 0.98461538 1. 1. ]
mean value: 0.9847086247086247
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98461538 0.95384615 1. 1. 0.95384615 0.98461538
0.98507463 0.98507463 1. 1. ]
mean value: 0.9847072330654421
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96875 1. 1. 0.93939394 0.96969697
0.97058824 0.97058824 1. 1. ]
mean value: 0.9819017379679145
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96969697 0.93939394 1. 1. 0.96875 1.
1. 1. 1. 1. ]
mean value: 0.9877840909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98484848 0.95454545 1. 1. 0.95407197 0.98484848
0.984375 0.984375 1. 1. ]
mean value: 0.9847064393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96969697 0.91176471 1. 1. 0.91176471 0.96969697
0.97058824 0.97058824 1. 1. ]
mean value: 0.9704099821746881
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.79
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02863717 0.04037857 0.04070926 0.03674984 0.04031229 0.04035211
0.0404079 0.04057741 0.04059529 0.04037046]
mean value: 0.03890902996063232
key: score_time
value: [0.01908898 0.01911449 0.0154767 0.01687717 0.0188179 0.01907849
0.01894832 0.02247143 0.01914191 0.01913929]
mean value: 0.01881546974182129
key: test_mcc
value: [0.90950859 0.87878788 0.88382395 0.93844697 0.84995597 0.94028478
0.96969697 0.96966868 0.84644588 0.90814394]
mean value: 0.9094763613775467
key: train_mcc
value: [0.94881099 0.9488552 0.94208333 0.94212641 0.94889774 0.93869211
0.94550795 0.942084 0.95575756 0.95584685]
mean value: 0.9468662121748642
key: test_accuracy
value: [0.95454545 0.93939394 0.93846154 0.96923077 0.92307692 0.96923077
0.98461538 0.98461538 0.92307692 0.95384615]
mean value: 0.954009324009324
key: train_accuracy
value: [0.97440273 0.97440273 0.97103918 0.97103918 0.97444634 0.9693356
0.97274276 0.97103918 0.97785349 0.97785349]
mean value: 0.9734154694140972
key: test_fscore
value: [0.95522388 0.93939394 0.94117647 0.96875 0.92537313 0.96969697
0.98461538 0.98507463 0.92537313 0.95384615]
mean value: 0.9548523694260086
key: train_fscore
value: [0.97444634 0.97453311 0.97113752 0.97123519 0.97453311 0.96949153
0.97278912 0.97103918 0.97792869 0.97800338]
mean value: 0.9735137167185135
key: test_precision
value: [0.94117647 0.93939394 0.88888889 0.96875 0.88571429 0.94117647
1. 0.97058824 0.91176471 0.96875 ]
mean value: 0.9416202996350055
key: train_precision
value: [0.97278912 0.96959459 0.96949153 0.96632997 0.97288136 0.96621622
0.96949153 0.96938776 0.97297297 0.96979866]
mean value: 0.9698953685359831
key: test_recall
value: [0.96969697 0.93939394 1. 0.96875 0.96875 1.
0.96969697 1. 0.93939394 0.93939394]
mean value: 0.9695075757575757
key: train_recall/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_sl.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
value: [0.97610922 0.97952218 0.97278912 0.97619048 0.97619048 0.97278912
0.97610922 0.97269625 0.98293515 0.98634812]
mean value: 0.977167932019224
key: test_roc_auc
value: [0.95454545 0.93939394 0.93939394 0.96922348 0.92376894 0.96969697
0.98484848 0.984375 0.92282197 0.95407197]
mean value: 0.9542140151515152
key: train_roc_auc
value: [0.97440273 0.97440273 0.9710362 0.97103039 0.97444336 0.96932971
0.97274849 0.971042 0.97786213 0.97786794]
mean value: 0.9734165679923846
key: test_jcc
value: [0.91428571 0.88571429 0.88888889 0.93939394 0.86111111 0.94117647
0.96969697 0.97058824 0.86111111 0.91176471]
mean value: 0.9143731431966726
key: train_jcc
value: [0.95016611 0.95033113 0.94389439 0.94407895 0.95033113 0.94078947
0.94701987 0.94370861 0.95681063 0.95695364]
mean value: 0.9484083925538549
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.28756905 0.30973029 0.30490494 0.31140661 0.35169911 0.40486741
0.30568242 0.30639338 0.30918193 0.30819392]
mean value: 0.31996290683746337
key: score_time
value: [0.01921129 0.01902771 0.01945329 0.01908779 0.02094746 0.01899981
0.01919055 0.01918483 0.01925635 0.01911736]
mean value: 0.019347643852233885
key: test_mcc
value: [0.90950859 0.87878788 0.88382395 0.93844697 0.84995597 0.94028478
0.96969697 0.94017476 0.84644588 0.90814394]
mean value: 0.9065269687521299
key: train_mcc
value: [0.94881099 0.9488552 0.94208333 0.94212641 0.94889774 0.93869211
0.94550795 0.95232236 0.95575756 0.95584685]
mean value: 0.9478900476185305
key: test_accuracy
value: [0.95454545 0.93939394 0.93846154 0.96923077 0.92307692 0.96923077
0.98461538 0.96923077 0.92307692 0.95384615]
mean value: 0.9524708624708625
key: train_accuracy
value: [0.97440273 0.97440273 0.97103918 0.97103918 0.97444634 0.9693356
0.97274276 0.97614991 0.97785349 0.97785349]
mean value: 0.9739265426679302
key: test_fscore
value: [0.95522388 0.93939394 0.94117647 0.96875 0.92537313 0.96969697
0.98461538 0.97058824 0.92537313 0.95384615]
mean value: 0.9534037302688532
key: train_fscore
value: [0.97444634 0.97453311 0.97113752 0.97123519 0.97453311 0.96949153
0.97278912 0.97619048 0.97792869 0.97800338]
mean value: 0.9740288461092816
key: test_precision
value: [0.94117647 0.93939394 0.88888889 0.96875 0.88571429 0.94117647
1. 0.94285714 0.91176471 0.96875 ]
mean value: 0.938847190391308
key: train_precision
value: [0.97278912 0.96959459 0.96949153 0.96632997 0.97288136 0.96621622
0.96949153 0.97288136 0.97297297 0.96979866]
mean value: 0.9702447286189994
key: test_recall
value: [0.96969697 0.93939394 1. 0.96875 0.96875 1.
0.96969697 1. 0.93939394 0.93939394]
mean value: 0.9695075757575757
key: train_recall
value: [0.97610922 0.97952218 0.97278912 0.97619048 0.97619048 0.97278912
0.97610922 0.97952218 0.98293515 0.98634812]
mean value: 0.9778505258758794
key: test_roc_auc
value: [0.95454545 0.93939394 0.93939394 0.96922348 0.92376894 0.96969697
0.98484848 0.96875 0.92282197 0.95407197]
mean value: 0.9526515151515151
key: train_roc_auc
value: [0.97440273 0.97440273 0.9710362 0.97103039 0.97444336 0.96932971
0.97274849 0.97615565 0.97786213 0.97786794]
mean value: 0.9739279329479232
key: test_jcc
value: [0.91428571 0.88571429 0.88888889 0.93939394 0.86111111 0.94117647
0.96969697 0.94285714 0.86111111 0.91176471]
mean value: 0.9116000339529752
key: train_jcc
value: [0.95016611 0.95033113 0.94389439 0.94407895 0.95033113 0.94078947
0.94701987 0.95348837 0.95681063 0.95695364]
mean value: 0.9493863688360049
MCC on Blind test: 0.92
Accuracy on Blind test: 0.97