LSHTM_analysis/scripts/ml/log_rpob_sl.txt

19724 lines
978 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 176
-------------------------------------------------------------
Successfully split data according to scaling law: 1/np.sqrt(x_ncols)
Train data size: (515, 176)
Test data size: 0.07537783614444091 (42, 176)
y_train numbers: Counter({0: 261, 1: 254})
y_train ratio: 1.0275590551181102
y_test_numbers: Counter({0: 21, 1: 21})
y_test ratio: 1.0
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 261, 1: 261})
(522, 176)
Simple Random UnderSampling
Counter({0: 254, 1: 254})
(508, 176)
Simple Combined Over and UnderSampling
Counter({0: 261, 1: 261})
(522, 176)
SMOTE_NC OverSampling
Counter({0: 261, 1: 261})
(522, 176)
#####################################################################
Running ML analysis: scaling law split
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_sl/
Sanity checks:
ML source data size: (557, 176)
Total input features: (515, 176)
Target feature numbers: Counter({0: 261, 1: 254})
Target features ratio: 1.0275590551181102
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03457308 0.05314136 0.03903151 0.0377934 0.03705144 0.03332138
0.03554845 0.03472877 0.0331893 0.03619957]
mean value: 0.037457823753356934
key: score_time
value: [0.01273632 0.01220393 0.01418066 0.01225758 0.01433253 0.01218915
0.0123353 0.01226211 0.01226807 0.01440454]
mean value: 0.012917017936706543
key: test_mcc
value: [0.76888889 0.61538462 0.84866842 0.84866842 0.77151675 0.88289781
0.76733527 0.88289781 0.80990051 0.69568237]
mean value: 0.7891840882445919
key: train_mcc
value: [0.87494868 0.86615908 0.86178968 0.86190423 0.85751876 0.84920893
0.86645175 0.86645175 0.85783034 0.87499419]
mean value: 0.8637257413526429
key: test_accuracy
value: [0.88461538 0.80769231 0.92307692 0.92307692 0.88461538 0.94117647
0.88235294 0.94117647 0.90196078 0.84313725]
mean value: 0.8932880844645551
key: train_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93736501 0.93304536 0.93088553 0.93088553 0.9287257 0.92456897
0.93318966 0.93318966 0.92887931 0.9375 ]
mean value: 0.9318234713636703
key: test_fscore
value: [0.88 0.80769231 0.92 0.92 0.88888889 0.93877551
0.88461538 0.93877551 0.90566038 0.85185185]
mean value: 0.8936259830815086
key: train_fscore
value: [0.93736501 0.93246187 0.930131 0.93043478 0.92810458 0.92407809
0.93275488 0.93275488 0.92841649 0.93681917]
mean value: 0.931332075708447
key: test_precision
value: [0.88 0.80769231 0.95833333 0.95833333 0.85714286 0.95833333
0.85185185 0.95833333 0.85714286 0.79310345]
mean value: 0.888026665543907
key: train_precision
value: [0.92735043 0.92640693 0.92608696 0.92241379 0.92207792 0.91810345
0.92672414 0.92672414 0.92241379 0.93478261]
mean value: 0.9253084151397495
key: test_recall
value: [0.88 0.80769231 0.88461538 0.88461538 0.92307692 0.92
0.92 0.92 0.96 0.92 ]
mean value: 0.902
key: train_recall
value: [0.94759825 0.93859649 0.93421053 0.93859649 0.93421053 0.930131
0.93886463 0.93886463 0.93449782 0.93886463]
mean value: 0.9374434995786409
key: test_roc_auc
value: [0.88444444 0.80769231 0.92307692 0.92307692 0.88461538 0.94076923
0.88307692 0.94076923 0.90307692 0.84461538]
mean value: 0.8935213675213675
key: train_roc_auc
value: [0.93747434 0.93312803 0.93093505 0.93100037 0.92880739 0.92463997
0.9332621 0.9332621 0.92895104 0.93751742]
mean value: 0.9318977817951397
key: test_jcc
value: [0.78571429 0.67741935 0.85185185 0.85185185 0.8 0.88461538
0.79310345 0.88461538 0.82758621 0.74193548]
mean value: 0.809869325253085
key: train_jcc
value: [0.88211382 0.87346939 0.86938776 0.8699187 0.86585366 0.85887097
0.87398374 0.87398374 0.86639676 0.88114754]
mean value: 0.8715126071252873
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.84475875 0.97533607 0.87528491 1.0226748 0.90170574 0.9565835
0.97258997 0.88586736 1.06623507 0.90682554]
mean value: 0.9407861709594727
key: score_time
value: [0.01480126 0.01481223 0.01491356 0.01508093 0.02434945 0.01536441
0.01489663 0.01477122 0.01504946 0.0123744 ]
mean value: 0.01564135551452637
key: test_mcc
value: [0.80829038 0.65433031 0.84866842 0.88527041 0.73568294 0.88289781
0.80461538 0.92153846 0.8459178 0.65224812]
mean value: 0.803946003970833
key: train_mcc
value: [0.90510935 0.91374613 0.89664633 0.90072034 0.90072034 0.90085939
0.90549103 0.89669076 0.8968689 0.83622884]
mean value: 0.8953081428164044
key: test_accuracy
value: [0.90384615 0.82692308 0.92307692 0.94230769 0.86538462 0.94117647
0.90196078 0.96078431 0.92156863 0.82352941]
mean value: 0.9010558069381599
key: train_accuracy
value: [0.9524838 0.95680346 0.94816415 0.95032397 0.95032397 0.95043103
0.95258621 0.94827586 0.94827586 0.91810345]
mean value: 0.9475771765844939
key: test_fscore
value: [0.90196078 0.82352941 0.92 0.94117647 0.87272727 0.93877551
0.90196078 0.96 0.92307692 0.83018868]
mean value: 0.9013395836233953
key: train_fscore
value: [0.95238095 0.95652174 0.94805195 0.94989107 0.94989107 0.94989107
0.95258621 0.94805195 0.94827586 0.9173913 ]
mean value: 0.9472933163543006
key: test_precision
value: [0.88461538 0.84 0.95833333 0.96 0.82758621 0.95833333
0.88461538 0.96 0.88888889 0.78571429]
mean value: 0.8948086817397162
key: train_precision
value: [0.94420601 0.94827586 0.93589744 0.94372294 0.94372294 0.94782609
0.94042553 0.93991416 0.93617021 0.91341991]
mean value: 0.9393581102143395
key: test_recall
value: [0.92 0.80769231 0.88461538 0.92307692 0.92307692 0.92
0.92 0.96 0.96 0.88 ]
mean value: 0.9098461538461539
key: train_recall
value: [0.96069869 0.96491228 0.96052632 0.95614035 0.95614035 0.95196507
0.9650655 0.95633188 0.96069869 0.92139738]
mean value: 0.9553876503485789
key: test_roc_auc
value: [0.90444444 0.82692308 0.92307692 0.94230769 0.86538462 0.94076923
0.90230769 0.96076923 0.92230769 0.82461538]
mean value: 0.9012905982905982
key: train_roc_auc
value: [0.95257157 0.95692423 0.94834826 0.9504106 0.9504106 0.95045062
0.95274552 0.9483787 0.94843445 0.9181455 ]
mean value: 0.9476820048433202
key: test_jcc
value: [0.82142857 0.7 0.85185185 0.88888889 0.77419355 0.88461538
0.82142857 0.92307692 0.85714286 0.70967742]
mean value: 0.8232304016174984
key: train_jcc
value: [0.90909091 0.91666667 0.90123457 0.90456432 0.90456432 0.90456432
0.90946502 0.90123457 0.90163934 0.84738956]
mean value: 0.9000413580689495
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01280427 0.01010084 0.00992489 0.00991201 0.01013994 0.01014113
0.01022482 0.01023531 0.01004434 0.01003003]
mean value: 0.010355758666992187
key: score_time
value: [0.00947094 0.00892138 0.00892901 0.0089221 0.00908446 0.00910974
0.00904751 0.00909019 0.00910616 0.00897598]
mean value: 0.009065747261047363
key: test_mcc
value: [0.54156684 0.57735027 0.74466871 0.70064905 0.66628253 0.65064936
0.68779719 0.57342193 0.72615385 0.72984534]
mean value: 0.6598385073077018
key: train_mcc
value: [0.69176702 0.6927847 0.69160663 0.66143964 0.70344863 0.70415149
0.69511551 0.67751955 0.67041841 0.69062182]
mean value: 0.6878873412216182
key: test_accuracy
value: [0.76923077 0.78846154 0.86538462 0.84615385 0.82692308 0.82352941
0.84313725 0.78431373 0.8627451 0.8627451 ]
mean value: 0.827262443438914
key: train_accuracy
value: [0.84449244 0.84449244 0.84449244 0.82937365 0.85097192 0.8512931
0.84698276 0.8362069 0.83405172 0.84482759]
mean value: 0.8427184963133983
key: test_fscore
value: [0.73913043 0.78431373 0.85106383 0.83333333 0.80851064 0.80851064
0.83333333 0.79245283 0.8627451 0.85106383]
mean value: 0.8164457691337579
key: train_fscore
value: [0.83486239 0.83255814 0.83410138 0.81755196 0.8428246 0.84353741
0.83972912 0.82242991 0.82379863 0.83783784]
mean value: 0.83292313777467
key: test_precision
value: [0.80952381 0.8 0.95238095 0.90909091 0.9047619 0.86363636
0.86956522 0.75 0.84615385 0.90909091]
mean value: 0.8614203912029998
key: train_precision
value: [0.87922705 0.88613861 0.87864078 0.86341463 0.87677725 0.87735849
0.86915888 0.88442211 0.86538462 0.86511628]
mean value: 0.8745638703109545
key: test_recall
value: [0.68 0.76923077 0.76923077 0.76923077 0.73076923 0.76
0.8 0.84 0.88 0.8 ]
mean value: 0.7798461538461539
key: train_recall
value: [0.79475983 0.78508772 0.79385965 0.77631579 0.81140351 0.81222707
0.81222707 0.76855895 0.7860262 0.81222707]
mean value: 0.7952692867540029
key: test_roc_auc
value: [0.76592593 0.78846154 0.86538462 0.84615385 0.82692308 0.82230769
0.84230769 0.78538462 0.86307692 0.86153846]
mean value: 0.8267464387464387
key: train_roc_auc
value: [0.84396111 0.84360769 0.84373834 0.82858343 0.85038261 0.85079439
0.84653907 0.83534331 0.83343863 0.84441141]
mean value: 0.8420799970776743
key: test_jcc
value: [0.5862069 0.64516129 0.74074074 0.71428571 0.67857143 0.67857143
0.71428571 0.65625 0.75862069 0.74074074]
mean value: 0.6913434643725245
key: train_jcc
value: [0.71653543 0.71314741 0.71541502 0.69140625 0.72834646 0.72941176
0.72373541 0.6984127 0.70038911 0.72093023]
mean value: 0.7137729779180588
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01047707 0.01029992 0.01025939 0.01041651 0.01023507 0.0102303
0.0112865 0.01102686 0.01036859 0.01120663]
mean value: 0.010580682754516601
key: score_time
value: [0.00895643 0.00900006 0.00895834 0.00900507 0.00899935 0.00976944
0.00984073 0.00931478 0.00945687 0.00984907]
mean value: 0.009315013885498047
key: test_mcc
value: [0.57831366 0.4233902 0.69230769 0.71151247 0.73568294 0.72573276
0.80461538 0.88307692 0.60769231 0.64715023]
mean value: 0.6809474562124411
key: train_mcc
value: [0.69835966 0.77538376 0.72791401 0.72780737 0.77538491 0.75470857
0.75496039 0.72841838 0.75426257 0.74133606]
mean value: 0.7438535657906272
key: test_accuracy
value: [0.78846154 0.71153846 0.84615385 0.84615385 0.86538462 0.8627451
0.90196078 0.94117647 0.80392157 0.82352941]
mean value: 0.8391025641025641
key: train_accuracy
value: [0.8488121 0.88768898 0.86393089 0.86393089 0.88768898 0.87715517
0.87715517 0.86422414 0.87715517 0.87068966]
mean value: 0.8718431146197959
key: test_fscore
value: [0.76595745 0.70588235 0.84615385 0.82608696 0.87272727 0.85714286
0.90196078 0.94117647 0.8 0.81632653]
mean value: 0.8333414517809609
key: train_fscore
value: [0.84304933 0.88495575 0.8627451 0.86092715 0.88646288 0.87741935
0.87794433 0.86153846 0.87527352 0.86899563]
mean value: 0.8699311510042489
key: test_precision
value: [0.81818182 0.72 0.84615385 0.95 0.82758621 0.875
0.88461538 0.92307692 0.8 0.83333333]
mean value: 0.8477947512257857
key: train_precision
value: [0.86635945 0.89285714 0.85714286 0.86666667 0.8826087 0.86440678
0.86134454 0.86725664 0.87719298 0.86899563]
mean value: 0.8704831379611647
key: test_recall
value: [0.72 0.69230769 0.84615385 0.73076923 0.92307692 0.84
0.92 0.96 0.8 0.8 ]
mean value: 0.8232307692307692
key: train_recall
value: [0.8209607 0.87719298 0.86842105 0.85526316 0.89035088 0.89082969
0.89519651 0.8558952 0.87336245 0.86899563]
mean value: 0.8696468244847928
key: test_roc_auc
value: [0.78592593 0.71153846 0.84615385 0.84615385 0.86538462 0.86230769
0.90230769 0.94153846 0.80384615 0.82307692]
mean value: 0.8388233618233618
key: train_roc_auc
value: [0.84851454 0.88753266 0.86399776 0.86380179 0.88772863 0.87732974
0.87738549 0.86411781 0.87710675 0.87066803]
mean value: 0.8718183204075174
key: test_jcc
value: [0.62068966 0.54545455 0.73333333 0.7037037 0.77419355 0.75
0.82142857 0.88888889 0.66666667 0.68965517]
mean value: 0.7194014085449013
key: train_jcc
value: [0.72868217 0.79365079 0.75862069 0.75581395 0.79607843 0.7816092
0.78244275 0.75675676 0.77821012 0.76833977]
mean value: 0.7700204624031467
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01085258 0.01142216 0.01102972 0.010952 0.0106256 0.01055479
0.01059937 0.0103457 0.01040864 0.01033592]
mean value: 0.010712647438049316
key: score_time
value: [0.07376838 0.01394439 0.01408267 0.01353168 0.01317382 0.01507711
0.01556063 0.01267171 0.01504016 0.01285124]
mean value: 0.019970178604125977
key: test_mcc
value: [0.61551019 0.38575837 0.38575837 0.5 0.53846154 0.5372904
0.65064936 0.72573276 0.60769231 0.45474301]
mean value: 0.5401596315962462
key: train_mcc
value: [0.70231538 0.72376727 0.69801004 0.71054252 0.71511629 0.68110244
0.67701807 0.68110244 0.71576891 0.69138045]
mean value: 0.6996123825721399
key: test_accuracy
value: [0.80769231 0.69230769 0.69230769 0.73076923 0.76923077 0.76470588
0.82352941 0.8627451 0.80392157 0.7254902 ]
mean value: 0.7672699849170437
key: train_accuracy
value: [0.85097192 0.86177106 0.8488121 0.85529158 0.8574514 0.84051724
0.83836207 0.84051724 0.85775862 0.84482759]
mean value: 0.8496280814776197
key: test_fscore
value: [0.79166667 0.68 0.7037037 0.66666667 0.76923077 0.77777778
0.80851064 0.85714286 0.8 0.69565217]
mean value: 0.7550351253399358
key: train_fscore
value: [0.84632517 0.85714286 0.84304933 0.85339168 0.85267857 0.83628319
0.83296214 0.83628319 0.85333333 0.83636364]
mean value: 0.8447813087328101
key: test_precision
value: [0.82608696 0.70833333 0.67857143 0.875 0.76923077 0.72413793
0.86363636 0.875 0.8 0.76190476]
mean value: 0.7881901544232879
key: train_precision
value: [0.86363636 0.87272727 0.86238532 0.85152838 0.86818182 0.84753363
0.85 0.84753363 0.86877828 0.87203791]
mean value: 0.8604342619734768
key: test_recall
value: [0.76 0.65384615 0.73076923 0.53846154 0.76923077 0.84
0.76 0.84 0.8 0.64 ]
mean value: 0.7332307692307692
key: train_recall
value: [0.82969432 0.84210526 0.8245614 0.85526316 0.8377193 0.82532751
0.81659389 0.82532751 0.83842795 0.80349345]
mean value: 0.8298513751627978
key: test_roc_auc
value: [0.80592593 0.69230769 0.69230769 0.73076923 0.76923077 0.76615385
0.82230769 0.86230769 0.80384615 0.72384615]
mean value: 0.7669002849002848
key: train_roc_auc
value: [0.8507446 0.86147816 0.84845091 0.85529115 0.85715752 0.84032333
0.83808418 0.84032333 0.85751185 0.84429992]
mean value: 0.8493664950009298
key: test_jcc
value: [0.65517241 0.51515152 0.54285714 0.5 0.625 0.63636364
0.67857143 0.75 0.66666667 0.53333333]
mean value: 0.6103116136736826
key: train_jcc
value: [0.73359073 0.75 0.72868217 0.74427481 0.74319066 0.71863118
0.71374046 0.71863118 0.74418605 0.71875 ]
mean value: 0.7313677236713617
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02325892 0.02017713 0.02105737 0.02158213 0.02060938 0.02063775
0.0231297 0.02031779 0.02411246 0.02205038]
mean value: 0.0216933012008667
key: score_time
value: [0.0124054 0.0114367 0.01170635 0.01159406 0.01113749 0.01143193
0.01157951 0.01144934 0.01246667 0.0119288 ]
mean value: 0.011713624000549316
key: test_mcc
value: [0.80829038 0.65433031 0.84866842 0.84866842 0.80829038 0.84544958
0.76733527 0.92153846 0.76733527 0.68875274]
mean value: 0.7958659239597995
key: train_mcc
value: [0.79696947 0.81423213 0.7927817 0.7927817 0.79695053 0.79323288
0.79739862 0.78885906 0.80169098 0.81031311]
mean value: 0.7985210167824943
key: test_accuracy
value: [0.90384615 0.82692308 0.92307692 0.92307692 0.90384615 0.92156863
0.88235294 0.96078431 0.88235294 0.84313725]
mean value: 0.8970965309200604
key: train_accuracy
value: [0.89848812 0.90712743 0.89632829 0.89632829 0.89848812 0.89655172
0.8987069 0.89439655 0.90086207 0.90517241]
mean value: 0.8992449914351679
key: test_fscore
value: [0.90196078 0.83018868 0.92 0.92 0.90566038 0.91666667
0.88461538 0.96 0.88461538 0.84615385]
mean value: 0.8969861122968781
key: train_fscore
value: [0.89760349 0.9059081 0.89565217 0.89565217 0.89715536 0.8961039
0.89760349 0.89370933 0.89956332 0.90393013]
mean value: 0.8982881450268425
key: test_precision
value: [0.88461538 0.81481481 0.95833333 0.95833333 0.88888889 0.95652174
0.85185185 0.96 0.85185185 0.81481481]
mean value: 0.8940026012634709
key: train_precision
value: [0.89565217 0.90393013 0.88793103 0.88793103 0.89519651 0.88841202
0.89565217 0.88793103 0.89956332 0.90393013]
mean value: 0.894612955577799
key: test_recall
value: [0.92 0.84615385 0.88461538 0.88461538 0.92307692 0.88
0.92 0.96 0.92 0.88 ]
mean value: 0.9018461538461539
key: train_recall
value: [0.89956332 0.90789474 0.90350877 0.90350877 0.89912281 0.90393013
0.89956332 0.89956332 0.89956332 0.90393013]
mean value: 0.9020148624837202
key: test_roc_auc
value: [0.90444444 0.82692308 0.92307692 0.92307692 0.90384615 0.92076923
0.88307692 0.96076923 0.88307692 0.84384615]
mean value: 0.8972905982905983
key: train_roc_auc
value: [0.89849961 0.90713886 0.89643524 0.89643524 0.89849757 0.89664592
0.89871783 0.89446251 0.90084549 0.90515655]
mean value: 0.8992834814328039
key: test_jcc
value: [0.82142857 0.70967742 0.85185185 0.85185185 0.82758621 0.84615385
0.79310345 0.92307692 0.79310345 0.73333333]
mean value: 0.8151166900499492
key: train_jcc
value: [0.81422925 0.828 0.81102362 0.81102362 0.81349206 0.81176471
0.81422925 0.80784314 0.81746032 0.8247012 ]
mean value: 0.8153767161426962
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.01321244 1.85925555 2.15802002 2.02537608 2.0681262 2.02303672
2.0381639 1.67555785 1.83844972 2.44499278]
mean value: 2.01441912651062
key: score_time
value: [0.01252031 0.01444769 0.01506567 0.01446342 0.01447082 0.01439071
0.01457739 0.01256704 0.01462984 0.0286572 ]
mean value: 0.015579009056091308
key: test_mcc
value: [0.69185185 0.69436507 0.65433031 0.81312325 0.81312325 0.88307692
0.84544958 0.88289781 0.76733527 0.68875274]
mean value: 0.7734306059974984
key: train_mcc
value: [0.99568893 0.99568893 1. 0.99568893 1. 0.98714723
1. 0.97003963 1. 0.99137787]
mean value: 0.9935631530556671
key: test_accuracy
value: [0.84615385 0.84615385 0.82692308 0.90384615 0.90384615 0.94117647
0.92156863 0.94117647 0.88235294 0.84313725]
mean value: 0.8856334841628959
key: train_accuracy
value: [0.99784017 0.99784017 1. 0.99784017 1. 0.99353448
1. 0.98491379 1. 0.99568966]
mean value: 0.9967658449393014
key: test_fscore
value: [0.84 0.84 0.82352941 0.89795918 0.90909091 0.94117647
0.91666667 0.93877551 0.88461538 0.84615385]
mean value: 0.8837967382757299
key: train_fscore
value: [0.99781182 0.99781182 1. 0.99781182 1. 0.99340659
1. 0.98454746 1. 0.99563319]
mean value: 0.9967022691125853
key: test_precision
value: [0.84 0.875 0.84 0.95652174 0.86206897 0.92307692
0.95652174 0.95833333 0.85185185 0.81481481]
mean value: 0.8878189366855034
key: train_precision
value: [1. 0.99563319 1. 0.99563319 1. 1.
1. 0.99553571 1. 0.99563319]
mean value: 0.9982435277604491
key: test_recall
value: [0.84 0.80769231 0.80769231 0.84615385 0.96153846 0.96
0.88 0.92 0.92 0.88 ]
mean value: 0.8823076923076923
key: train_recall
value: [0.99563319 1. 1. 1. 1. 0.98689956
1. 0.97379913 1. 0.99563319]
mean value: 0.9951965065502183
key: test_roc_auc
value: [0.84592593 0.84615385 0.82692308 0.90384615 0.90384615 0.94153846
0.92076923 0.94076923 0.88307692 0.84384615]
mean value: 0.8856695156695157
key: train_roc_auc
value: [0.99781659 0.99787234 1. 0.99787234 1. 0.99344978
1. 0.9847719 1. 0.99568893]
mean value: 0.9967471894453219
key: test_jcc
value: [0.72413793 0.72413793 0.7 0.81481481 0.83333333 0.88888889
0.84615385 0.88461538 0.79310345 0.73333333]
mean value: 0.7942518911484429
key: train_jcc
value: [0.99563319 0.99563319 1. 0.99563319 1. 0.98689956
1. 0.96956522 1. 0.99130435]
mean value: 0.9934668691854945
MCC on Blind test: 0.75
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02841353 0.02183843 0.02099395 0.01935482 0.01939464 0.0214653
0.02150893 0.02194858 0.02061439 0.02167058]
mean value: 0.021720314025878908
key: score_time
value: [0.01223016 0.00936103 0.00875926 0.00877237 0.00869417 0.0087049
0.00894022 0.00874114 0.00898623 0.00875068]
mean value: 0.009194016456604004
key: test_mcc
value: [0.80829038 0.81312325 0.92307692 0.88527041 0.89056356 0.96153846
0.80431528 0.80461538 0.76461538 0.96148034]
mean value: 0.8616889371976575
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90384615 0.90384615 0.96153846 0.94230769 0.94230769 0.98039216
0.90196078 0.90196078 0.88235294 0.98039216]
mean value: 0.9300904977375566
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90196078 0.89795918 0.96153846 0.94339623 0.94545455 0.98039216
0.89795918 0.90196078 0.88 0.97959184]
mean value: 0.929021316297993
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88461538 0.95652174 0.96153846 0.92592593 0.89655172 0.96153846
0.91666667 0.88461538 0.88 1. ]
mean value: 0.9267973748168651
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92 0.84615385 0.96153846 0.96153846 1. 1.
0.88 0.92 0.88 0.96 ]
mean value: 0.932923076923077
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90444444 0.90384615 0.96153846 0.94230769 0.94230769 0.98076923
0.90153846 0.90230769 0.88230769 0.98 ]
mean value: 0.9301367521367522
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82142857 0.81481481 0.92592593 0.89285714 0.89655172 0.96153846
0.81481481 0.82142857 0.78571429 0.96 ]
mean value: 0.8695074312660519
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11908174 0.11891198 0.12085533 0.12205338 0.12058616 0.12348747
0.12098384 0.12022972 0.11956501 0.12257099]
mean value: 0.12083256244659424
key: score_time
value: [0.01758242 0.01875472 0.01774883 0.01888084 0.01758361 0.0192914
0.01767397 0.0176754 0.01803231 0.0176661 ]
mean value: 0.01808896064758301
key: test_mcc
value: [0.7364532 0.69230769 0.88527041 0.77849894 0.88527041 0.84544958
0.80461538 0.88289781 0.72573276 0.65224812]
mean value: 0.7888744323177563
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86538462 0.84615385 0.94230769 0.88461538 0.94230769 0.92156863
0.90196078 0.94117647 0.8627451 0.82352941]
mean value: 0.8931749622926093
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86792453 0.84615385 0.94117647 0.875 0.94339623 0.91666667
0.90196078 0.93877551 0.85714286 0.83018868]
mean value: 0.8918385569031677
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82142857 0.84615385 0.96 0.95454545 0.92592593 0.95652174
0.88461538 0.95833333 0.875 0.78571429]
mean value: 0.8968238540847236
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92 0.84615385 0.92307692 0.80769231 0.96153846 0.88
0.92 0.92 0.84 0.88 ]
mean value: 0.8898461538461538
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86740741 0.84615385 0.94230769 0.88461538 0.94230769 0.92076923
0.90230769 0.94076923 0.86230769 0.82461538]
mean value: 0.8933561253561253
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76666667 0.73333333 0.88888889 0.77777778 0.89285714 0.84615385
0.82142857 0.88461538 0.75 0.70967742]
mean value: 0.807139903107645
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01038766 0.01013541 0.01010156 0.01030636 0.01012969 0.01005673
0.0101459 0.0112083 0.01133013 0.01134515]
mean value: 0.01051468849182129
key: score_time
value: [0.00891232 0.00864291 0.00884056 0.00908256 0.008775 0.00878358
0.0087533 0.00932074 0.00876355 0.00923038]
mean value: 0.008910489082336426
key: test_mcc
value: [0.54074074 0.27104108 0.58080232 0.40422604 0.66628253 0.33282012
0.5685677 0.64769231 0.49076923 0.61017022]
mean value: 0.5113112288991031
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76923077 0.63461538 0.78846154 0.69230769 0.82692308 0.66666667
0.78431373 0.82352941 0.74509804 0.80392157]
mean value: 0.7535067873303167
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76923077 0.65454545 0.7755102 0.63636364 0.84210526 0.65306122
0.7755102 0.82352941 0.74509804 0.80769231]
mean value: 0.7482646514623515
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.74074074 0.62068966 0.82608696 0.77777778 0.77419355 0.66666667
0.79166667 0.80769231 0.73076923 0.77777778]
mean value: 0.7514061328172418
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.69230769 0.73076923 0.53846154 0.92307692 0.64
0.76 0.84 0.76 0.84 ]
mean value: 0.7524615384615385
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77037037 0.63461538 0.78846154 0.69230769 0.82692308 0.66615385
0.78384615 0.82384615 0.74538462 0.80461538]
mean value: 0.7536524216524216
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.625 0.48648649 0.63333333 0.46666667 0.72727273 0.48484848
0.63333333 0.7 0.59375 0.67741935]
mean value: 0.6028110386779741
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.78404307 1.84690738 1.85415792 1.84416199 1.86087155 1.84431767
1.88158655 1.84635663 1.76327705 1.80362248]
mean value: 1.8329302310943603
key: score_time
value: [0.09506845 0.09978175 0.10103059 0.09958267 0.0994401 0.10090494
0.10089231 0.09912086 0.09177494 0.10069799]
mean value: 0.09882946014404297
key: test_mcc
value: [0.84888889 0.84866842 0.96225045 0.92307692 0.9258201 1.
0.92427578 1. 0.88307692 0.88307692]
mean value: 0.9199134409597195
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92307692 0.92307692 0.98076923 0.96153846 0.96153846 1.
0.96078431 1. 0.94117647 0.94117647]
mean value: 0.9593137254901961
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.92 0.98039216 0.96153846 0.96296296 1.
0.95833333 1. 0.94117647 0.94117647]
mean value: 0.9588656778950897
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.95833333 1. 0.96153846 0.92857143 1.
1. 1. 0.92307692 0.92307692]
mean value: 0.9583485958485959
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96 0.88461538 0.96153846 0.96153846 1. 1.
0.92 1. 0.96 0.96 ]
mean value: 0.9607692307692308
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92444444 0.92307692 0.98076923 0.96153846 0.96153846 1.
0.96 1. 0.94153846 0.94153846]
mean value: 0.9594444444444444
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.85185185 0.96153846 0.92592593 0.92857143 1.
0.92 1. 0.88888889 0.88888889]
mean value: 0.9222808302808303
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.84273124 0.98533678 1.06106544 0.9535141 1.02706265 0.97413754
0.97359943 0.98847985 1.03109789 0.98040366]
mean value: 1.0817428588867188
key: score_time
value: [0.2416172 0.22115898 0.2508235 0.22502398 0.17808771 0.23984313
0.2546699 0.28216171 0.22378087 0.21789837]
mean value: 0.23350653648376465
key: test_mcc
value: [0.813662 0.76923077 0.96225045 0.92307692 0.88527041 1.
0.92427578 1. 0.88307692 0.88307692]
mean value: 0.9043920181288048
key: train_mcc
value: [0.95683011 0.95247872 0.95679358 0.96112065 0.95682367 0.94826721
0.95258977 0.9482967 0.95258977 0.95692011]
mean value: 0.9542710305666754
key: test_accuracy
value: [0.90384615 0.88461538 0.98076923 0.96153846 0.94230769 1.
0.96078431 1. 0.94117647 0.94117647]
mean value: 0.9516214177978883
key: train_accuracy
value: [0.97840173 0.9762419 0.97840173 0.98056156 0.97840173 0.97413793
0.9762931 0.97413793 0.9762931 0.97844828]
mean value: 0.9771318984136441
key: test_fscore
value: [0.90566038 0.88461538 0.98039216 0.96153846 0.94339623 1.
0.95833333 1. 0.94117647 0.94117647]
mean value: 0.951628888129998
key: train_fscore
value: [0.97807018 0.97582418 0.97807018 0.98021978 0.97797357 0.97379913
0.97603486 0.97368421 0.97603486 0.97807018]
mean value: 0.9767781104581152
key: test_precision
value: [0.85714286 0.88461538 1. 0.96153846 0.92592593 1.
1. 1. 0.92307692 0.92307692]
mean value: 0.9475376475376476
key: train_precision
value: [0.98237885 0.97797357 0.97807018 0.98237885 0.98230088 0.97379913
0.97391304 0.97797357 0.97391304 0.98237885]
mean value: 0.9785079974428954
key: test_recall
value: [0.96 0.88461538 0.96153846 0.96153846 0.96153846 1.
0.92 1. 0.96 0.96 ]
mean value: 0.9569230769230769
key: train_recall
value: [0.97379913 0.97368421 0.97807018 0.97807018 0.97368421 0.97379913
0.97816594 0.96943231 0.97816594 0.97379913]
mean value: 0.9750670343982226
key: test_roc_auc
value: [0.90592593 0.88461538 0.98076923 0.96153846 0.94230769 1.
0.96 1. 0.94153846 0.94153846]
mean value: 0.9518233618233618
key: train_roc_auc
value: [0.97835255 0.97620381 0.97839679 0.98052445 0.97833147 0.97413361
0.97631701 0.97407786 0.97631701 0.97838893]
mean value: 0.9771043482593041
key: test_jcc
value: [0.82758621 0.79310345 0.96153846 0.92592593 0.89285714 1.
0.92 1. 0.88888889 0.88888889]
mean value: 0.9098788963271722
key: train_jcc
value: [0.95708155 0.9527897 0.95708155 0.9612069 0.95689655 0.94893617
0.95319149 0.94871795 0.95319149 0.95708155]
mean value: 0.954617488069393
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02028584 0.01054978 0.01048946 0.01108623 0.01054072 0.01161528
0.0117743 0.01121378 0.01106954 0.01049757]
mean value: 0.011912250518798828
key: score_time
value: [0.01542568 0.00919604 0.00922108 0.00998044 0.00957513 0.00999999
0.00992775 0.00970435 0.00996804 0.00929213]
mean value: 0.010229063034057618
key: test_mcc
value: [0.57831366 0.4233902 0.69230769 0.71151247 0.73568294 0.72573276
0.80461538 0.88307692 0.60769231 0.64715023]
mean value: 0.6809474562124411
key: train_mcc
value: [0.69835966 0.77538376 0.72791401 0.72780737 0.77538491 0.75470857
0.75496039 0.72841838 0.75426257 0.74133606]
mean value: 0.7438535657906272
key: test_accuracy
value: [0.78846154 0.71153846 0.84615385 0.84615385 0.86538462 0.8627451
0.90196078 0.94117647 0.80392157 0.82352941]
mean value: 0.8391025641025641
key: train_accuracy
value: [0.8488121 0.88768898 0.86393089 0.86393089 0.88768898 0.87715517
0.87715517 0.86422414 0.87715517 0.87068966]
mean value: 0.8718431146197959
key: test_fscore
value: [0.76595745 0.70588235 0.84615385 0.82608696 0.87272727 0.85714286
0.90196078 0.94117647 0.8 0.81632653]
mean value: 0.8333414517809609
key: train_fscore
value: [0.84304933 0.88495575 0.8627451 0.86092715 0.88646288 0.87741935
0.87794433 0.86153846 0.87527352 0.86899563]
mean value: 0.8699311510042489
key: test_precision
value: [0.81818182 0.72 0.84615385 0.95 0.82758621 0.875
0.88461538 0.92307692 0.8 0.83333333]
mean value: 0.8477947512257857
key: train_precision
value: [0.86635945 0.89285714 0.85714286 0.86666667 0.8826087 0.86440678
0.86134454 0.86725664 0.87719298 0.86899563]
mean value: 0.8704831379611647
key: test_recall
value: [0.72 0.69230769 0.84615385 0.73076923 0.92307692 0.84
0.92 0.96 0.8 0.8 ]
mean value: 0.8232307692307692
key: train_recall
value: [0.8209607 0.87719298 0.86842105 0.85526316 0.89035088 0.89082969
0.89519651 0.8558952 0.87336245 0.86899563]
mean value: 0.8696468244847928
key: test_roc_auc
value: [0.78592593 0.71153846 0.84615385 0.84615385 0.86538462 0.86230769
0.90230769 0.94153846 0.80384615 0.82307692]
mean value: 0.8388233618233618
key: train_roc_auc
value: [0.84851454 0.88753266 0.86399776 0.86380179 0.88772863 0.87732974
0.87738549 0.86411781 0.87710675 0.87066803]
mean value: 0.8718183204075174
key: test_jcc
value: [0.62068966 0.54545455 0.73333333 0.7037037 0.77419355 0.75
0.82142857 0.88888889 0.66666667 0.68965517]
mean value: 0.7194014085449013
key: train_jcc
value: [0.72868217 0.79365079 0.75862069 0.75581395 0.79607843 0.7816092
0.78244275 0.75675676 0.77821012 0.76833977]
mean value: 0.7700204624031467
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.15760708 0.06621742 0.07752442 0.07321143 0.0758338 0.08350158
0.07533073 0.07601404 0.06880569 0.07532454]
mean value: 0.08293707370758056
key: score_time
value: [0.0113287 0.01082397 0.01109338 0.01086092 0.01102328 0.01106405
0.0109849 0.01106334 0.01134181 0.01326489]
mean value: 0.011284923553466797
key: test_mcc
value: [0.89087081 0.84866842 0.96225045 0.92307692 0.9258201 1.
0.92153846 1. 0.88307692 0.92427578]
mean value: 0.9279577865544593
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94230769 0.92307692 0.98076923 0.96153846 0.96153846 1.
0.96078431 1. 0.94117647 0.96078431]
mean value: 0.9631975867269985
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94339623 0.92 0.98039216 0.96153846 0.96296296 1.
0.96 1. 0.94117647 0.95833333]
mean value: 0.9627799611700832
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.95833333 1. 0.96153846 0.92857143 1.
0.96 1. 0.92307692 1. ]
mean value: 0.9624377289377289
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88461538 0.96153846 0.96153846 1. 1.
0.96 1. 0.96 0.92 ]
mean value: 0.9647692307692308
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.92307692 0.98076923 0.96153846 0.96153846 1.
0.96076923 1. 0.94153846 0.96 ]
mean value: 0.9633675213675214
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89285714 0.85185185 0.96153846 0.92592593 0.92857143 1.
0.92307692 1. 0.88888889 0.92 ]
mean value: 0.9292710622710623
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05204272 0.08309412 0.08293486 0.09231877 0.06768751 0.06185079
0.0759449 0.04235697 0.07767081 0.07520199]
mean value: 0.07111034393310547
key: score_time
value: [0.01879072 0.03160286 0.01222348 0.02677679 0.01260066 0.01890087
0.01229119 0.01862192 0.02106357 0.01877522]
mean value: 0.019164729118347167
key: test_mcc
value: [0.77185185 0.61538462 0.84866842 0.77151675 0.84866842 0.88289781
0.64769231 0.80904133 0.73107432 0.6610182 ]
mean value: 0.7587814027095846
key: train_mcc
value: [0.91800556 0.90108249 0.91392793 0.91358716 0.91358716 0.90520077
0.90129433 0.9009374 0.91411317 0.91393374]
mean value: 0.9095669708390538
key: test_accuracy
value: [0.88461538 0.80769231 0.92307692 0.88461538 0.92307692 0.94117647
0.82352941 0.90196078 0.8627451 0.82352941]
mean value: 0.8776018099547511
key: train_accuracy
value: [0.95896328 0.95032397 0.95680346 0.95680346 0.95680346 0.95258621
0.95043103 0.95043103 0.95689655 0.95689655]
mean value: 0.954693900350041
key: test_fscore
value: [0.88461538 0.80769231 0.92 0.88 0.92592593 0.93877551
0.82352941 0.89361702 0.86792453 0.83636364]
mean value: 0.8778443726144525
key: train_fscore
value: [0.95878525 0.95032397 0.95670996 0.95614035 0.95614035 0.95217391
0.95053763 0.95010846 0.95689655 0.95670996]
mean value: 0.954452639776014
key: test_precision
value: [0.85185185 0.80769231 0.95833333 0.91666667 0.89285714 0.95833333
0.80769231 0.95454545 0.82142857 0.76666667]
mean value: 0.8736067636067636
key: train_precision
value: [0.95258621 0.93617021 0.94444444 0.95614035 0.95614035 0.94805195
0.93644068 0.94396552 0.94468085 0.94849785]
mean value: 0.9467118414261851
key: test_recall
value: [0.92 0.80769231 0.88461538 0.84615385 0.96153846 0.92
0.84 0.84 0.92 0.92 ]
mean value: 0.886
key: train_recall
value: [0.9650655 0.96491228 0.96929825 0.95614035 0.95614035 0.95633188
0.9650655 0.95633188 0.96943231 0.9650655 ]
mean value: 0.962378380448939
key: test_roc_auc
value: [0.88592593 0.80769231 0.92307692 0.88461538 0.92307692 0.94076923
0.82384615 0.90076923 0.86384615 0.82538462]
mean value: 0.8779002849002849
key: train_roc_auc
value: [0.95902848 0.95054125 0.95698955 0.95679358 0.95679358 0.95263402
0.95061786 0.95050636 0.95705658 0.95700084]
mean value: 0.9547962096825529
key: test_jcc
value: [0.79310345 0.67741935 0.85185185 0.78571429 0.86206897 0.88461538
0.7 0.80769231 0.76666667 0.71875 ]
mean value: 0.784788226517231
key: train_jcc
value: [0.92083333 0.90534979 0.91701245 0.91596639 0.91596639 0.90871369
0.9057377 0.90495868 0.91735537 0.91701245]
mean value: 0.9128906244397688
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01436949 0.01111484 0.01016617 0.01088428 0.00985456 0.00989771
0.01070356 0.01117134 0.01106 0.01109123]
mean value: 0.011031317710876464
key: score_time
value: [0.01223636 0.00920248 0.00911593 0.00875282 0.00875449 0.00877905
0.00953007 0.00965476 0.00958037 0.00875211]
mean value: 0.00943584442138672
key: test_mcc
value: [0.65330526 0.54006172 0.84866842 0.70064905 0.80829038 0.72573276
0.72573276 0.72615385 0.68875274 0.608971 ]
mean value: 0.7026317935253169
key: train_mcc
value: [0.67238923 0.72423761 0.71496629 0.70646532 0.72361387 0.74138866
0.69411122 0.70713779 0.70314599 0.71561406]
mean value: 0.7103070036060428
key: test_accuracy
value: [0.82692308 0.76923077 0.92307692 0.84615385 0.90384615 0.8627451
0.8627451 0.8627451 0.84313725 0.80392157]
mean value: 0.8504524886877828
key: train_accuracy
value: [0.83585313 0.86177106 0.8574514 0.85313175 0.86177106 0.87068966
0.84698276 0.85344828 0.8512931 0.85775862]
mean value: 0.8550150815520965
key: test_fscore
value: [0.81632653 0.76 0.92 0.83333333 0.90196078 0.85714286
0.85714286 0.8627451 0.84615385 0.79166667]
mean value: 0.8446471973404747
key: train_fscore
value: [0.82959641 0.85585586 0.85333333 0.84821429 0.85777778 0.86784141
0.84257206 0.84888889 0.84563758 0.8539823 ]
mean value: 0.8503699910679656
key: test_precision
value: [0.83333333 0.79166667 0.95833333 0.90909091 0.92 0.875
0.875 0.84615385 0.81481481 0.82608696]
mean value: 0.8649479859914643
key: train_precision
value: [0.85253456 0.87962963 0.86486486 0.86363636 0.86936937 0.87555556
0.85585586 0.86425339 0.86697248 0.86547085]
mean value: 0.8658142923870936
key: test_recall
value: [0.8 0.73076923 0.88461538 0.76923077 0.88461538 0.84
0.84 0.88 0.88 0.76 ]
mean value: 0.8269230769230769
key: train_recall
value: [0.80786026 0.83333333 0.84210526 0.83333333 0.84649123 0.86026201
0.82969432 0.83406114 0.82532751 0.84279476]
mean value: 0.8355263157894737
key: test_roc_auc
value: [0.82592593 0.76923077 0.92307692 0.84615385 0.90384615 0.86230769
0.86230769 0.86307692 0.84384615 0.80307692]
mean value: 0.8502849002849002
key: train_roc_auc
value: [0.83555406 0.86134752 0.85722284 0.85283688 0.86154349 0.87055654
0.84676206 0.85320078 0.85096163 0.85756759]
mean value: 0.8547553382911727
key: test_jcc
value: [0.68965517 0.61290323 0.85185185 0.71428571 0.82142857 0.75
0.75 0.75862069 0.73333333 0.65517241]
mean value: 0.7337250972567991
key: train_jcc
value: [0.70881226 0.7480315 0.74418605 0.73643411 0.75097276 0.76653696
0.72796935 0.73745174 0.73255814 0.74517375]
mean value: 0.7398126610083979
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01407576 0.01649785 0.02283406 0.02057886 0.01880026 0.02140307
0.02072906 0.02038002 0.02303767 0.01901793]
mean value: 0.01973545551300049
key: score_time
value: [0.01000118 0.01126051 0.01198721 0.01775503 0.01397157 0.01187444
0.01192546 0.01181602 0.01192856 0.0117836 ]
mean value: 0.012430357933044433
key: test_mcc
value: [0.67524617 0.65433031 0.81312325 0.88527041 0.73131034 0.76662339
0.76733527 0.73878883 0.74071542 0.65224812]
mean value: 0.7424991517164395
key: train_mcc
value: [0.76102063 0.87912177 0.90936066 0.85585682 0.85829967 0.89258812
0.86828293 0.81782174 0.8647866 0.8793363 ]
mean value: 0.8586475244728068
key: test_accuracy
value: [0.82692308 0.82692308 0.90384615 0.94230769 0.86538462 0.88235294
0.88235294 0.8627451 0.8627451 0.82352941]
mean value: 0.8679110105580694
key: train_accuracy
value: [0.86825054 0.93952484 0.95464363 0.92656587 0.92656587 0.94612069
0.93318966 0.90517241 0.93103448 0.93965517]
mean value: 0.9270723169732629
key: test_fscore
value: [0.79069767 0.82352941 0.89795918 0.94339623 0.8627451 0.875
0.88461538 0.84444444 0.87272727 0.83018868]
mean value: 0.8625303375343475
key: train_fscore
value: [0.84711779 0.9380531 0.95424837 0.92827004 0.92093023 0.94456763
0.93446089 0.89671362 0.93277311 0.93913043]
mean value: 0.923626520709015
key: test_precision
value: [0.94444444 0.84 0.95652174 0.92592593 0.88 0.91304348
0.85185185 0.95 0.8 0.78571429]
mean value: 0.8847501725327812
key: train_precision
value: [0.99411765 0.94642857 0.94805195 0.89430894 0.98019802 0.95945946
0.9057377 0.96954315 0.89878543 0.93506494]
mean value: 0.9431695801182518
key: test_recall
value: [0.68 0.80769231 0.84615385 0.96153846 0.84615385 0.84
0.92 0.76 0.96 0.88 ]
mean value: 0.8501538461538461
key: train_recall
value: [0.73799127 0.92982456 0.96052632 0.96491228 0.86842105 0.930131
0.9650655 0.83406114 0.96943231 0.94323144]
mean value: 0.9103596874281774
key: test_roc_auc
value: [0.82148148 0.82692308 0.90384615 0.94230769 0.86538462 0.88153846
0.88307692 0.86076923 0.86461538 0.82461538]
mean value: 0.8674558404558405
key: train_roc_auc
value: [0.86685888 0.93938037 0.95473124 0.92713699 0.92569989 0.94591657
0.93359658 0.90426461 0.93152467 0.93970083]
mean value: 0.9268810621174348
key: test_jcc
value: [0.65384615 0.7 0.81481481 0.89285714 0.75862069 0.77777778
0.79310345 0.73076923 0.77419355 0.70967742]
mean value: 0.760566022573809
key: train_jcc
value: [0.73478261 0.88333333 0.9125 0.86614173 0.85344828 0.89495798
0.87698413 0.81276596 0.87401575 0.8852459 ]
mean value: 0.8594175667469572
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01722074 0.02114892 0.02169371 0.01978874 0.02115488 0.02141595
0.01782823 0.02157593 0.02269053 0.02523351]
mean value: 0.020975112915039062
key: score_time
value: [0.01061797 0.01195741 0.01185656 0.01192594 0.01196694 0.01201153
0.01211405 0.01274276 0.01431131 0.01298666]
mean value: 0.012249112129211426
key: test_mcc
value: [0.81203628 0.66628253 0.72760688 0.88527041 0.72760688 0.84544958
0.84307692 0.84307692 0.88289781 0.77487835]
mean value: 0.8008182566592692
key: train_mcc
value: [0.84181709 0.87136001 0.8004481 0.90499207 0.71142522 0.89664473
0.87722401 0.89655622 0.90180046 0.87805565]
mean value: 0.8580323560247036
key: test_accuracy
value: [0.90384615 0.82692308 0.84615385 0.94230769 0.84615385 0.92156863
0.92156863 0.92156863 0.94117647 0.88235294]
mean value: 0.8953619909502262
key: train_accuracy
value: [0.91792657 0.93304536 0.89416847 0.9524838 0.83801296 0.94827586
0.9375 0.94827586 0.95043103 0.9375 ]
mean value: 0.9257619907648768
key: test_fscore
value: [0.89361702 0.84210526 0.81818182 0.94117647 0.86666667 0.91666667
0.92 0.92 0.93877551 0.88888889]
mean value: 0.8946078305630848
key: train_fscore
value: [0.91162791 0.93555094 0.88192771 0.95196507 0.85768501 0.94713656
0.93424036 0.94736842 0.94854586 0.93920335]
mean value: 0.9255251191697211
key: test_precision
value: [0.95454545 0.77419355 1. 0.96 0.76470588 0.95652174
0.92 0.92 0.95833333 0.82758621]
mean value: 0.9035886164645812
key: train_precision
value: [0.97512438 0.88932806 0.97860963 0.94782609 0.75585284 0.95555556
0.97169811 0.95154185 0.97247706 0.90322581]
mean value: 0.9301239386440059
key: test_recall
value: [0.84 0.92307692 0.69230769 0.92307692 1. 0.88
0.92 0.92 0.92 0.96 ]
mean value: 0.8978461538461538
key: train_recall
value: [0.8558952 0.98684211 0.80263158 0.95614035 0.99122807 0.93886463
0.89956332 0.94323144 0.92576419 0.97816594]
mean value: 0.9278326821420363
key: test_roc_auc
value: [0.90148148 0.82692308 0.84615385 0.94230769 0.84615385 0.92076923
0.92153846 0.92153846 0.94076923 0.88384615]
mean value: 0.8951481481481481
key: train_roc_auc
value: [0.91726384 0.93384658 0.89280515 0.95253826 0.84029489 0.94815572
0.9370157 0.94821147 0.95011614 0.93801914]
mean value: 0.9258266884068974
key: test_jcc
value: [0.80769231 0.72727273 0.69230769 0.88888889 0.76470588 0.84615385
0.85185185 0.85185185 0.88461538 0.8 ]
mean value: 0.8115340432987492
key: train_jcc
value: [0.83760684 0.87890625 0.7887931 0.90833333 0.75083056 0.89958159
0.87659574 0.9 0.90212766 0.88537549]
mean value: 0.8628150577457124
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.1831696 0.17794561 0.17552376 0.17649794 0.17604017 0.18365979
0.17523932 0.17550063 0.17589378 0.1765306 ]
mean value: 0.17760012149810792
key: score_time
value: [0.01525378 0.01530957 0.01576948 0.0153296 0.01532698 0.01565194
0.01529312 0.01551795 0.01531744 0.01527667]
mean value: 0.015404653549194337
key: test_mcc
value: [0.92592593 0.92307692 0.9258201 0.92307692 0.9258201 1.
0.88289781 0.96153846 0.84307692 0.92427578]
mean value: 0.9235508946829519
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96153846 0.96153846 0.96153846 0.96153846 0.96153846 1.
0.94117647 0.98039216 0.92156863 0.96078431]
mean value: 0.9611613876319759
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96153846 0.96153846 0.96 0.96153846 0.96296296 1.
0.93877551 0.98039216 0.92 0.95833333]
mean value: 0.9605079347978508
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92592593 0.96153846 1. 0.96153846 0.92857143 1.
0.95833333 0.96153846 0.92 1. ]
mean value: 0.9617446072446073
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96153846 0.92307692 0.96153846 1. 1.
0.92 1. 0.92 0.92 ]
mean value: 0.9606153846153846
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96296296 0.96153846 0.96153846 0.96153846 0.96153846 1.
0.94076923 0.98076923 0.92153846 0.96 ]
mean value: 0.9612193732193732
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92592593 0.92592593 0.92307692 0.92592593 0.92857143 1.
0.88461538 0.96153846 0.85185185 0.92 ]
mean value: 0.9247431827431828
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06544733 0.06211853 0.06923485 0.07710958 0.07985926 0.07961392
0.05945921 0.07828259 0.06221557 0.06299257]
mean value: 0.0696333408355713
key: score_time
value: [0.02117729 0.03006291 0.02708936 0.02777982 0.02915573 0.03034329
0.02469993 0.02470422 0.03304958 0.02316904]
mean value: 0.027123117446899415
key: test_mcc
value: [0.89087081 0.92307692 0.96225045 0.92307692 0.9258201 1.
0.80431528 0.96153846 0.88307692 0.92153846]
mean value: 0.9195564331630951
key: train_mcc
value: [0.98275766 0.98275637 0.99135872 0.99568837 0.99135872 0.98275574
0.9870767 0.99569843 0.98290567 0.9870767 ]
mean value: 0.9879433054726061
key: test_accuracy
value: [0.94230769 0.96153846 0.98076923 0.96153846 0.96153846 1.
0.90196078 0.98039216 0.94117647 0.96078431]
mean value: 0.9592006033182504
key: train_accuracy
value: [0.99136069 0.99136069 0.99568035 0.99784017 0.99568035 0.99137931
0.99353448 0.99784483 0.99137931 0.99353448]
mean value: 0.9939594660013406
key: test_fscore
value: [0.94339623 0.96153846 0.98039216 0.96153846 0.96296296 1.
0.89795918 0.98039216 0.94117647 0.96 ]
mean value: 0.9589356080442175
key: train_fscore
value: [0.99130435 0.99126638 0.99561404 0.9978022 0.99561404 0.99126638
0.99346405 0.99782135 0.99134199 0.99346405]
mean value: 0.9938958813575108
key: test_precision
value: [0.89285714 0.96153846 1. 0.96153846 0.92857143 1.
0.91666667 0.96153846 0.92307692 0.96 ]
mean value: 0.9505787545787546
key: train_precision
value: [0.98701299 0.98695652 0.99561404 1. 0.99561404 0.99126638
0.99130435 0.99565217 0.98283262 0.99130435]
mean value: 0.9917557442064376
key: test_recall
value: [1. 0.96153846 0.96153846 0.96153846 1. 1.
0.88 1. 0.96 0.96 ]
mean value: 0.9684615384615385
key: train_recall
value: [0.99563319 0.99561404 0.99561404 0.99561404 0.99561404 0.99126638
0.99563319 1. 1. 0.99563319]
mean value: 0.9960622079215506
key: test_roc_auc
value: [0.94444444 0.96153846 0.98076923 0.96153846 0.96153846 1.
0.90153846 0.98076923 0.94153846 0.96076923]
mean value: 0.9594444444444444
key: train_roc_auc
value: [0.99140634 0.99142404 0.99567936 0.99780702 0.99567936 0.99137787
0.99356127 0.99787234 0.99148936 0.99356127]
mean value: 0.9939858230006007
key: test_jcc
value: [0.89285714 0.92592593 0.96153846 0.92592593 0.92857143 1.
0.81481481 0.96153846 0.88888889 0.92307692]
mean value: 0.9223137973137974
key: train_jcc
value: [0.98275862 0.98268398 0.99126638 0.99561404 0.99126638 0.98268398
0.98701299 0.99565217 0.98283262 0.98701299]
mean value: 0.9878784138201812
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.09743285 0.13634515 0.15849376 0.17319727 0.15302992 0.16544819
0.15362334 0.14878941 0.14881253 0.15468121]
mean value: 0.14898536205291749
key: score_time
value: [0.0148344 0.01481652 0.02447915 0.02398372 0.02386975 0.02401042
0.02402806 0.02415848 0.02408814 0.024122 ]
mean value: 0.022239065170288085
key: test_mcc
value: [0.65330526 0.53846154 0.6172134 0.466924 0.62279916 0.68875274
0.68615385 0.84307692 0.68615385 0.61017022]
mean value: 0.6413010922501776
key: train_mcc
value: [0.98712064 0.99568837 0.98711849 0.98711849 0.98711849 0.98714723
0.99141377 0.98714723 0.98714723 0.98714723]
mean value: 0.9884167178261011
key: test_accuracy
value: [0.82692308 0.76923077 0.80769231 0.71153846 0.80769231 0.84313725
0.84313725 0.92156863 0.84313725 0.80392157]
mean value: 0.8177978883861237
key: train_accuracy
value: [0.99352052 0.99784017 0.99352052 0.99352052 0.99352052 0.99353448
0.99568966 0.99353448 0.99353448 0.99353448]
mean value: 0.9941749832427199
key: test_fscore
value: [0.81632653 0.76923077 0.8 0.63414634 0.82142857 0.84615385
0.84 0.92 0.84 0.80769231]
mean value: 0.8094978366581154
key: train_fscore
value: [0.99340659 0.9978022 0.99337748 0.99337748 0.99337748 0.99340659
0.99561404 0.99340659 0.99340659 0.99340659]
mean value: 0.994058165025401
key: test_precision
value: [0.83333333 0.76923077 0.83333333 0.86666667 0.76666667 0.81481481
0.84 0.92 0.84 0.77777778]
mean value: 0.8261823361823362
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.76923077 0.76923077 0.5 0.88461538 0.88
0.84 0.92 0.84 0.84 ]
mean value: 0.8043076923076923
key: train_recall
value: [0.98689956 0.99561404 0.98684211 0.98684211 0.98684211 0.98689956
0.99126638 0.98689956 0.98689956 0.98689956]
mean value: 0.988190454301693
key: test_roc_auc
value: [0.82592593 0.76923077 0.80769231 0.71153846 0.80769231 0.84384615
0.84307692 0.92153846 0.84307692 0.80461538]
mean value: 0.8178233618233618
key: train_roc_auc
value: [0.99344978 0.99780702 0.99342105 0.99342105 0.99342105 0.99344978
0.99563319 0.99344978 0.99344978 0.99344978]
mean value: 0.9940952271508465
key: test_jcc
value: [0.68965517 0.625 0.66666667 0.46428571 0.6969697 0.73333333
0.72413793 0.85185185 0.72413793 0.67741935]
mean value: 0.6853457652428732
key: train_jcc
value: [0.98689956 0.99561404 0.98684211 0.98684211 0.98684211 0.98689956
0.99126638 0.98689956 0.98689956 0.98689956]
mean value: 0.988190454301693
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.71794391 0.7203269 0.72270894 0.72166705 0.71613026 0.71105385
0.71652889 0.72433043 0.72593617 0.72219372]
mean value: 0.7198820114135742
key: score_time
value: [0.00964975 0.00945163 0.00943828 0.00946069 0.00968385 0.00951886
0.00939965 0.00939608 0.00974894 0.00926399]
mean value: 0.009501171112060548
key: test_mcc
value: [0.89087081 0.88527041 0.96225045 0.92307692 0.9258201 1.
0.92153846 0.96153846 0.8459178 0.92153846]
mean value: 0.9237821874489884
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94230769 0.94230769 0.98076923 0.96153846 0.96153846 1.
0.96078431 0.98039216 0.92156863 0.96078431]
mean value: 0.9611990950226245
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94339623 0.94117647 0.98039216 0.96153846 0.96296296 1.
0.96 0.98039216 0.92307692 0.96 ]
mean value: 0.9612935358307167
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.96 1. 0.96153846 0.92857143 1.
0.96 0.96153846 0.88888889 0.96 ]
mean value: 0.9513394383394383
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.92307692 0.96153846 0.96153846 1. 1.
0.96 1. 0.96 0.96 ]
mean value: 0.9726153846153847
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.94230769 0.98076923 0.96153846 0.96153846 1.
0.96076923 0.98076923 0.92230769 0.96076923]
mean value: 0.9615213675213675
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89285714 0.88888889 0.96153846 0.92592593 0.92857143 1.
0.92307692 0.96153846 0.85714286 0.92307692]
mean value: 0.9262617012617013
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03045988 0.03296375 0.04668927 0.0325253 0.03056479 0.03066826
0.03425074 0.03044724 0.03053427 0.03095007]
mean value: 0.033005356788635254
key: score_time
value: [0.01308393 0.01344943 0.01736689 0.01323104 0.01809931 0.01370311
0.01521468 0.01508665 0.01611853 0.01422572]
mean value: 0.014957928657531738
key: test_mcc
value: [0.4637037 0.32338083 0.09128709 0.28697202 0.50037023 0.25161197
0.42192651 0.43108293 0.54660922 0.31510143]
mean value: 0.3632045934004423
key: train_mcc
value: [0.87029251 0.82908577 0.52028331 0.6055719 0.97411589 0.83676363
0.92126558 0.96566269 0.94140567 0.59943068]
mean value: 0.8063877621767472
key: test_accuracy
value: [0.73076923 0.65384615 0.53846154 0.63461538 0.75 0.60784314
0.70588235 0.70588235 0.76470588 0.60784314]
mean value: 0.6699849170437405
key: train_accuracy
value: [0.93088553 0.90712743 0.71058315 0.76673866 0.98704104 0.91163793
0.95905172 0.98275862 0.96982759 0.76293103]
mean value: 0.8888582706486929
key: test_fscore
value: [0.73076923 0.7 0.63636364 0.68852459 0.75471698 0.67741935
0.72727273 0.73684211 0.78571429 0.70588235]
mean value: 0.7143505264458934
key: train_fscore
value: [0.93469388 0.91382766 0.77288136 0.80851064 0.98689956 0.91783567
0.96016771 0.98268398 0.97033898 0.80633803]
mean value: 0.905417747054172
key: test_precision
value: [0.7037037 0.61764706 0.525 0.6 0.74074074 0.56756757
0.66666667 0.65625 0.70967742 0.55813953]
mean value: 0.6345392691740768
key: train_precision
value: [0.87739464 0.84132841 0.62983425 0.67857143 0.9826087 0.84814815
0.9233871 0.97424893 0.94238683 0.67551622]
mean value: 0.8373424655092186
key: test_recall
value: [0.76 0.80769231 0.80769231 0.80769231 0.76923077 0.84
0.8 0.84 0.88 0.96 ]
mean value: 0.8272307692307692
key: train_recall
value: [1. 1. 1. 1. 0.99122807 1.
1. 0.99126638 1. 1. ]
mean value: 0.998249444572129
key: test_roc_auc
value: [0.73185185 0.65384615 0.53846154 0.63461538 0.75 0.61230769
0.70769231 0.70846154 0.76692308 0.61461538]
mean value: 0.6718774928774929
key: train_roc_auc
value: [0.93162393 0.90851064 0.71489362 0.77021277 0.9871034 0.91276596
0.95957447 0.98286723 0.97021277 0.76595745]
mean value: 0.8903722218314364
key: test_jcc
value: [0.57575758 0.53846154 0.46666667 0.525 0.60606061 0.51219512
0.57142857 0.58333333 0.64705882 0.54545455]
mean value: 0.5571416782643468
key: train_jcc
value: [0.87739464 0.84132841 0.62983425 0.67857143 0.97413793 0.84814815
0.9233871 0.96595745 0.94238683 0.67551622]
mean value: 0.8356662410244379
MCC on Blind test: 0.45
Accuracy on Blind test: 0.69
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02886796 0.03506136 0.04856014 0.03749013 0.03853464 0.03879905
0.03333855 0.03817081 0.0379436 0.03178 ]
mean value: 0.03685462474822998
key: score_time
value: [0.0193758 0.01948166 0.02359152 0.02530694 0.02442861 0.02386785
0.01905274 0.02539372 0.02083659 0.01895976]
mean value: 0.022029519081115723
key: test_mcc
value: [0.80829038 0.65433031 0.84866842 0.88527041 0.80829038 0.88289781
0.76733527 0.88289781 0.80990051 0.73107432]
mean value: 0.8078955626786625
key: train_mcc
value: [0.86208312 0.86630587 0.85815088 0.84902492 0.84879533 0.85375825
0.85375825 0.86645175 0.85797371 0.84920893]
mean value: 0.856551100724223
key: test_accuracy
value: [0.90384615 0.82692308 0.92307692 0.94230769 0.90384615 0.94117647
0.88235294 0.94117647 0.90196078 0.8627451 ]
mean value: 0.9029411764705882
key: train_accuracy
value: [0.93088553 0.93304536 0.9287257 0.92440605 0.92440605 0.92672414
0.92672414 0.93318966 0.92887931 0.92456897]
mean value: 0.9281554889401952
key: test_fscore
value: [0.90196078 0.83018868 0.92 0.94117647 0.90566038 0.93877551
0.88461538 0.93877551 0.90566038 0.86792453]
mean value: 0.9034737622189659
key: train_fscore
value: [0.93103448 0.93275488 0.92903226 0.92407809 0.92341357 0.92672414
0.92672414 0.93275488 0.9287257 0.92407809]
mean value: 0.9279320228969524
key: test_precision
value: [0.88461538 0.81481481 0.95833333 0.96 0.88888889 0.95833333
0.85185185 0.95833333 0.85714286 0.82142857]
mean value: 0.8953742368742369
key: train_precision
value: [0.91914894 0.92274678 0.91139241 0.91416309 0.92139738 0.91489362
0.91489362 0.92672414 0.91880342 0.91810345]
mean value: 0.9182266831443672
key: test_recall
value: [0.92 0.84615385 0.88461538 0.92307692 0.92307692 0.92
0.92 0.92 0.96 0.92 ]
mean value: 0.9136923076923077
key: train_recall
value: [0.94323144 0.94298246 0.94736842 0.93421053 0.9254386 0.93886463
0.93886463 0.93886463 0.93886463 0.930131 ]
mean value: 0.9378820960698689
key: test_roc_auc
value: [0.90444444 0.82692308 0.92307692 0.94230769 0.90384615 0.94076923
0.88307692 0.94076923 0.90307692 0.86384615]
mean value: 0.9032136752136752
key: train_roc_auc
value: [0.93101743 0.93319336 0.92900336 0.92455207 0.92442143 0.92687912
0.92687912 0.9332621 0.92900678 0.92463997]
mean value: 0.9282854742942543
key: test_jcc
value: [0.82142857 0.70967742 0.85185185 0.88888889 0.82758621 0.88461538
0.79310345 0.88461538 0.82758621 0.76666667]
mean value: 0.8256020029490552
key: train_jcc
value: [0.87096774 0.87398374 0.86746988 0.85887097 0.85772358 0.86345382
0.86345382 0.87398374 0.86693548 0.85887097]
mean value: 0.8655713728241052
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.19555092 0.28002501 0.27410674 0.29807663 0.32655668 0.30876517
0.29639506 0.28637075 0.28587222 0.27678657]
mean value: 0.28285057544708253
key: score_time
value: [0.01900291 0.01891041 0.01891088 0.02085233 0.0235498 0.01884913
0.01897097 0.02552199 0.02648234 0.02427053]
mean value: 0.021532130241394044
key: test_mcc
value: [0.80829038 0.65433031 0.84866842 0.88527041 0.80829038 0.88289781
0.76733527 0.88289781 0.80990051 0.73107432]
mean value: 0.8078955626786625
key: train_mcc
value: [0.86208312 0.86630587 0.80159752 0.84902492 0.84879533 0.85375825
0.85375825 0.86645175 0.90108236 0.84920893]
mean value: 0.8552066307077918
key: test_accuracy
value: [0.90384615 0.82692308 0.92307692 0.94230769 0.90384615 0.94117647
0.88235294 0.94117647 0.90196078 0.8627451 ]
mean value: 0.9029411764705882
key: train_accuracy
value: [0.93088553 0.93304536 0.90064795 0.92440605 0.92440605 0.92672414
0.92672414 0.93318966 0.95043103 0.92456897]
mean value: 0.9275028859760185
key: test_fscore
value: [0.90196078 0.83018868 0.92 0.94117647 0.90566038 0.93877551
0.88461538 0.93877551 0.90566038 0.86792453]
mean value: 0.9034737622189659
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93103448 0.93275488 0.9004329 0.92407809 0.92341357 0.92672414
0.92672414 0.93275488 0.95032397 0.92407809]
mean value: 0.9272319143476138
key: test_precision
value: [0.88461538 0.81481481 0.95833333 0.96 0.88888889 0.95833333
0.85185185 0.95833333 0.85714286 0.82142857]
mean value: 0.8953742368742369
key: train_precision
value: [0.91914894 0.92274678 0.88888889 0.91416309 0.92139738 0.91489362
0.91489362 0.92672414 0.94017094 0.91810345]
mean value: 0.918113083663679
key: test_recall
value: [0.92 0.84615385 0.88461538 0.92307692 0.92307692 0.92
0.92 0.92 0.96 0.92 ]
mean value: 0.9136923076923077
key: train_recall
value: [0.94323144 0.94298246 0.9122807 0.93421053 0.9254386 0.93886463
0.93886463 0.93886463 0.96069869 0.930131 ]
mean value: 0.9365567302535815
key: test_roc_auc
value: [0.90444444 0.82692308 0.92307692 0.94230769 0.90384615 0.94076923
0.88307692 0.94076923 0.90307692 0.86384615]
mean value: 0.9032136752136752
key: train_roc_auc
value: [0.93101743 0.93319336 0.9008212 0.92455207 0.92442143 0.92687912
0.92687912 0.9332621 0.95056211 0.92463997]
mean value: 0.9276227913861107
key: test_jcc
value: [0.82142857 0.70967742 0.85185185 0.88888889 0.82758621 0.88461538
0.79310345 0.88461538 0.82758621 0.76666667]
mean value: 0.8256020029490552
key: train_jcc
value: [0.87096774 0.87398374 0.81889764 0.85887097 0.85772358 0.86345382
0.86345382 0.87398374 0.90534979 0.85887097]
mean value: 0.8645555796885971
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03841734 0.03949714 0.03965116 0.03697515 0.03645062 0.03646588
0.03761983 0.09846282 0.04134083 0.04195118]
mean value: 0.044683194160461424
key: score_time
value: [0.0146718 0.01451349 0.01439214 0.0156827 0.01446724 0.01457667
0.01458859 0.01217008 0.01210904 0.01474524]
mean value: 0.014191699028015137
key: test_mcc
value: [0.85164138 0.73997003 0.77849894 0.96225045 0.89056356 0.74466871
0.76923077 0.88527041 0.81312325 0.79056942]
mean value: 0.8225786910541291
key: train_mcc
value: [0.85946342 0.88089135 0.87262489 0.86395495 0.86815585 0.87246682
0.88113831 0.86411148 0.86411148 0.87660368]
mean value: 0.8703522241153047
key: test_accuracy
value: [0.9245283 0.86792453 0.88461538 0.98076923 0.94230769 0.86538462
0.88461538 0.94230769 0.90384615 0.88461538]
mean value: 0.9080914368650218
key: train_accuracy
value: [0.92963753 0.94029851 0.93617021 0.93191489 0.93404255 0.93617021
0.94042553 0.93191489 0.93191489 0.93829787]
mean value: 0.9350787097944926
key: test_fscore
value: [0.92592593 0.87719298 0.875 0.98113208 0.94545455 0.85106383
0.88461538 0.94117647 0.90909091 0.89655172]
mean value: 0.9087203847528004
key: train_fscore
value: [0.93052632 0.94092827 0.93697479 0.93248945 0.93446089 0.93670886
0.94117647 0.93277311 0.93277311 0.93842887]
mean value: 0.9357240139743418
key: test_precision
value: [0.89285714 0.83333333 0.95454545 0.96296296 0.89655172 0.95238095
0.88461538 0.96 0.86206897 0.8125 ]
mean value: 0.9011815920350403
key: train_precision
value: [0.92083333 0.92916667 0.9253112 0.92468619 0.92857143 0.92887029
0.92946058 0.92116183 0.92116183 0.93644068]
mean value: 0.9265664027577826
key: test_recall
value: [0.96153846 0.92592593 0.80769231 1. 1. 0.76923077
0.88461538 0.92307692 0.96153846 1. ]
mean value: 0.9233618233618234
key: train_recall
value: [0.94042553 0.95299145 0.94893617 0.94042553 0.94042553 0.94468085
0.95319149 0.94468085 0.94468085 0.94042553]
mean value: 0.9450863793416985
key: test_roc_auc
value: [0.92521368 0.86680912 0.88461538 0.98076923 0.94230769 0.86538462
0.88461538 0.94230769 0.90384615 0.88461538]
mean value: 0.9080484330484331
key: train_roc_auc
value: [0.92961448 0.94032551 0.93617021 0.93191489 0.93404255 0.93617021
0.94042553 0.93191489 0.93191489 0.93829787]
mean value: 0.9350791052918713
key: test_jcc
value: [0.86206897 0.78125 0.77777778 0.96296296 0.89655172 0.74074074
0.79310345 0.88888889 0.83333333 0.8125 ]
mean value: 0.8349177841634738
key: train_jcc
value: [0.87007874 0.88844622 0.88142292 0.87351779 0.87698413 0.88095238
0.88888889 0.87401575 0.87401575 0.884 ]
mean value: 0.8792322559647762
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.88060451 1.01606941 0.91109967 1.10975718 0.93197393 0.99338555
0.97769165 0.90921044 0.9921577 0.92200208]
mean value: 0.9643952131271363
key: score_time
value: [0.01480174 0.04356098 0.01479554 0.01655483 0.02265716 0.04054356
0.01495075 0.0150485 0.01497436 0.01514244]
mean value: 0.02130298614501953
key: test_mcc
value: [0.81196581 0.8116984 0.77849894 0.96225045 0.89056356 0.81312325
0.84615385 0.84866842 0.84866842 0.82305489]
mean value: 0.8434645996919574
key: train_mcc
value: [0.91474349 0.91484796 0.90667855 0.90220118 0.91064654 0.90233192
0.90667855 0.90233192 0.88965172 0.90220118]
mean value: 0.9052313012071413
key: test_accuracy
value: [0.90566038 0.90566038 0.88461538 0.98076923 0.94230769 0.90384615
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.9195936139332366
key: train_accuracy
value: [0.95735608 0.95735608 0.95319149 0.95106383 0.95531915 0.95106383
0.95319149 0.95106383 0.94468085 0.95106383]
mean value: 0.9525350451390464
key: test_fscore
value: [0.90566038 0.90909091 0.875 0.98113208 0.94545455 0.89795918
0.92307692 0.92 0.92592593 0.9122807 ]
mean value: 0.9195580641806348
key: train_fscore
value: [0.95762712 0.95762712 0.95378151 0.95137421 0.95541401 0.95157895
0.95378151 0.95157895 0.94537815 0.95137421]
mean value: 0.9529515735610741
key: test_precision
value: [0.88888889 0.89285714 0.95454545 0.96296296 0.89655172 0.95652174
0.92307692 0.95833333 0.89285714 0.83870968]
mean value: 0.916530498920957
key: train_precision
value: [0.9535865 0.94957983 0.94190871 0.94537815 0.95338983 0.94166667
0.94190871 0.94166667 0.93360996 0.94537815]
mean value: 0.9448073182078001
key: test_recall
value: [0.92307692 0.92592593 0.80769231 1. 1. 0.84615385
0.92307692 0.88461538 0.96153846 1. ]
mean value: 0.9272079772079772
key: train_recall
value: [0.96170213 0.96581197 0.96595745 0.95744681 0.95744681 0.96170213
0.96595745 0.96170213 0.95744681 0.95744681]
mean value: 0.9612620476450263
key: test_roc_auc
value: [0.90598291 0.90527066 0.88461538 0.98076923 0.94230769 0.90384615
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.9195868945868946
key: train_roc_auc
value: [0.95734679 0.95737407 0.95319149 0.95106383 0.95531915 0.95106383
0.95319149 0.95106383 0.94468085 0.95106383]
mean value: 0.952535915621022
key: test_jcc
value: [0.82758621 0.83333333 0.77777778 0.96296296 0.89655172 0.81481481
0.85714286 0.85185185 0.86206897 0.83870968]
mean value: 0.8522800171854676
key: train_jcc
value: [0.91869919 0.91869919 0.91164659 0.90725806 0.91463415 0.90763052
0.91164659 0.90763052 0.89641434 0.90725806]
mean value: 0.9101517208854413
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01405215 0.01066756 0.01140666 0.012398 0.01129866 0.01105595
0.01146221 0.01126456 0.01135373 0.01132464]
mean value: 0.011628413200378418
key: score_time
value: [0.01219368 0.00984573 0.00990057 0.00974965 0.00986099 0.00926423
0.00986862 0.00995731 0.0097661 0.00981498]
mean value: 0.010022187232971191
key: test_mcc
value: [0.66048569 0.40912228 0.71151247 0.77151675 0.80829038 0.65824263
0.57735027 0.50037023 0.54006172 0.73568294]
mean value: 0.6372635364356517
key: train_mcc
value: [0.66698754 0.68740344 0.69667663 0.67751905 0.69117257 0.71834239
0.67337154 0.67751905 0.67558392 0.69117257]
mean value: 0.6855748724120782
key: test_accuracy
value: [0.83018868 0.69811321 0.84615385 0.88461538 0.90384615 0.82692308
0.78846154 0.75 0.76923077 0.86538462]
mean value: 0.8162917271407837
key: train_accuracy
value: [0.8315565 0.84221748 0.84680851 0.83617021 0.84468085 0.85744681
0.83617021 0.83617021 0.83617021 0.84468085]
mean value: 0.8412071859547249
key: test_fscore
value: [0.82352941 0.66666667 0.82608696 0.88 0.90196078 0.81632653
0.78431373 0.75471698 0.76 0.85714286]
mean value: 0.807074391364421
key: train_fscore
value: [0.82247191 0.83408072 0.83928571 0.82539683 0.8388521 0.85011186
0.84057971 0.82539683 0.82774049 0.8388521 ]
mean value: 0.8342768246079215
key: test_precision
value: [0.84 0.76190476 0.95 0.91666667 0.92 0.86956522
0.8 0.74074074 0.79166667 0.91304348]
mean value: 0.8503587531631009
key: train_precision
value: [0.87142857 0.87735849 0.88262911 0.88349515 0.87155963 0.89622642
0.81854839 0.88349515 0.87264151 0.87155963]
mean value: 0.8728942038918088
key: test_recall
value: [0.80769231 0.59259259 0.73076923 0.84615385 0.88461538 0.76923077
0.76923077 0.76923077 0.73076923 0.80769231]
mean value: 0.7707977207977208
key: train_recall
value: [0.7787234 0.79487179 0.8 0.77446809 0.80851064 0.80851064
0.86382979 0.77446809 0.78723404 0.80851064]
mean value: 0.7999127114020731
key: test_roc_auc
value: [0.82977208 0.70014245 0.84615385 0.88461538 0.90384615 0.82692308
0.78846154 0.75 0.76923077 0.86538462]
mean value: 0.8164529914529915
key: train_roc_auc
value: [0.83166939 0.84211675 0.84680851 0.83617021 0.84468085 0.85744681
0.83617021 0.83617021 0.83617021 0.84468085]
mean value: 0.8412084015275505
key: test_jcc
value: [0.7 0.5 0.7037037 0.78571429 0.82142857 0.68965517
0.64516129 0.60606061 0.61290323 0.75 ]
mean value: 0.6814626855449992
key: train_jcc
value: [0.69847328 0.71538462 0.72307692 0.7027027 0.72243346 0.73929961
0.725 0.7027027 0.70610687 0.72243346]
mean value: 0.7157613627585733
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01250458 0.01152492 0.01148963 0.01054597 0.01109338 0.01149631
0.01151705 0.0116148 0.01161146 0.01170826]
mean value: 0.011510634422302246
key: score_time
value: [0.01028419 0.00985003 0.00978994 0.00903082 0.00983238 0.00987172
0.01005745 0.01006413 0.00986838 0.00981331]
mean value: 0.009846234321594238
key: test_mcc
value: [0.73646724 0.50997151 0.6172134 0.88527041 0.69436507 0.69230769
0.69436507 0.77151675 0.69436507 0.69230769]
mean value: 0.6988149917958625
key: train_mcc
value: [0.74840423 0.76129503 0.71066404 0.74894295 0.75778307 0.77046393
0.71925314 0.68550371 0.74075423 0.75330062]
mean value: 0.739636494142086
key: test_accuracy
value: [0.86792453 0.75471698 0.80769231 0.94230769 0.84615385 0.84615385
0.84615385 0.88461538 0.84615385 0.84615385]
mean value: 0.8488026124818577
key: train_accuracy
value: [0.87420043 0.88059701 0.85531915 0.87446809 0.8787234 0.88510638
0.85957447 0.84255319 0.87021277 0.87659574]
mean value: 0.8697350632853967
key: test_fscore
value: [0.86792453 0.75471698 0.8 0.94117647 0.85185185 0.84615385
0.84 0.88 0.85185185 0.84615385]
mean value: 0.8479829376033594
key: train_fscore
value: [0.87473461 0.87931034 0.85470085 0.87473461 0.88050314 0.88655462
0.8583691 0.83982684 0.86825054 0.87553648]
mean value: 0.869252113965142
key: test_precision
value: [0.85185185 0.76923077 0.83333333 0.96 0.82142857 0.84615385
0.875 0.91666667 0.82142857 0.84615385]
mean value: 0.8541247456247456
key: train_precision
value: [0.87288136 0.88695652 0.8583691 0.87288136 0.8677686 0.87551867
0.86580087 0.85462555 0.88157895 0.88311688]
mean value: 0.8719497846503439
key: test_recall
value: [0.88461538 0.74074074 0.76923077 0.92307692 0.88461538 0.84615385
0.80769231 0.84615385 0.88461538 0.84615385]
mean value: 0.8433048433048433
key: train_recall
value: [0.87659574 0.87179487 0.85106383 0.87659574 0.89361702 0.89787234
0.85106383 0.82553191 0.85531915 0.86808511]
mean value: 0.8667539552645935
key: test_roc_auc
value: [0.86823362 0.75498575 0.80769231 0.94230769 0.84615385 0.84615385
0.84615385 0.88461538 0.84615385 0.84615385]
mean value: 0.8488603988603989
key: train_roc_auc
value: [0.87419531 0.88057829 0.85531915 0.87446809 0.8787234 0.88510638
0.85957447 0.84255319 0.87021277 0.87659574]
mean value: 0.8697326786688488
key: test_jcc
value: [0.76666667 0.60606061 0.66666667 0.88888889 0.74193548 0.73333333
0.72413793 0.78571429 0.74193548 0.73333333]
mean value: 0.7388672679440199
key: train_jcc
value: [0.77735849 0.78461538 0.74626866 0.77735849 0.78651685 0.79622642
0.7518797 0.7238806 0.76717557 0.77862595]
mean value: 0.7689906114471405
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00976706 0.01079893 0.01038599 0.01070499 0.01065278 0.01069951
0.0106945 0.0093689 0.01063561 0.00939131]
mean value: 0.010309958457946777
key: score_time
value: [0.01736188 0.01571083 0.01535082 0.01273298 0.01242304 0.01235628
0.01241827 0.01493406 0.01261854 0.01222396]
mean value: 0.013813066482543945
key: test_mcc
value: [0.58487934 0.36194897 0.30769231 0.5990423 0.66628253 0.4233902
0.43929769 0.73568294 0.6172134 0.31139958]
mean value: 0.504682924498233
key: train_mcc
value: [0.72752093 0.72748132 0.73197454 0.69792921 0.71495188 0.71490009
0.72768593 0.69364214 0.70823856 0.73223982]
mean value: 0.7176564421606096
key: test_accuracy
value: [0.79245283 0.67924528 0.65384615 0.78846154 0.82692308 0.71153846
0.71153846 0.86538462 0.80769231 0.65384615]
mean value: 0.7490928882438317
key: train_accuracy
value: [0.86353945 0.86353945 0.86595745 0.84893617 0.85744681 0.85744681
0.86382979 0.84680851 0.85319149 0.86595745]
mean value: 0.8586653359343102
key: test_fscore
value: [0.78431373 0.66666667 0.65384615 0.75555556 0.84210526 0.71698113
0.66666667 0.87272727 0.8 0.625 ]
mean value: 0.7383862436185877
key: train_fscore
value: [0.86147186 0.86086957 0.86509636 0.84796574 0.85653105 0.85714286
0.86324786 0.84615385 0.84768212 0.86393089]
mean value: 0.8570092145719881
key: test_precision
value: [0.8 0.70833333 0.65384615 0.89473684 0.77419355 0.7037037
0.78947368 0.82758621 0.83333333 0.68181818]
mean value: 0.7667024987634145
key: train_precision
value: [0.87665198 0.87610619 0.87068966 0.85344828 0.86206897 0.85897436
0.86695279 0.84978541 0.88073394 0.87719298]
mean value: 0.8672604557430365
key: test_recall
value: [0.76923077 0.62962963 0.65384615 0.65384615 0.92307692 0.73076923
0.57692308 0.92307692 0.76923077 0.57692308]
mean value: 0.7206552706552707
key: train_recall
value: [0.84680851 0.84615385 0.85957447 0.84255319 0.85106383 0.85531915
0.85957447 0.84255319 0.81702128 0.85106383]
mean value: 0.8471685761047463
key: test_roc_auc
value: [0.79202279 0.68019943 0.65384615 0.78846154 0.82692308 0.71153846
0.71153846 0.86538462 0.80769231 0.65384615]
mean value: 0.7491452991452991
key: train_roc_auc
value: [0.8635752 0.86350245 0.86595745 0.84893617 0.85744681 0.85744681
0.86382979 0.84680851 0.85319149 0.86595745]
mean value: 0.8586652118567013
key: test_jcc
value: [0.64516129 0.5 0.48571429 0.60714286 0.72727273 0.55882353
0.5 0.77419355 0.66666667 0.45454545]
mean value: 0.5919520359463434
key: train_jcc
value: [0.75665399 0.75572519 0.76226415 0.73605948 0.74906367 0.75
0.7593985 0.73333333 0.73563218 0.76045627]
mean value: 0.7498586771390656
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02233267 0.02127671 0.02358603 0.02476525 0.02091074 0.02076507
0.02088213 0.02093339 0.02088308 0.02073622]
mean value: 0.02170712947845459
key: score_time
value: [0.01467752 0.01178527 0.01262712 0.01558471 0.01146078 0.01158738
0.01152992 0.01151848 0.01147294 0.01148391]
mean value: 0.012372803688049317
key: test_mcc
value: [0.81196581 0.69957726 0.74466871 0.92307692 0.84866842 0.77849894
0.73131034 0.88527041 0.81312325 0.73568294]
mean value: 0.7971843013202676
key: train_mcc
value: [0.79530824 0.80810708 0.80451759 0.78298581 0.79149653 0.80018114
0.80000724 0.78726255 0.7957735 0.80428445]
mean value: 0.7969924124303265
key: test_accuracy
value: [0.90566038 0.8490566 0.86538462 0.96153846 0.92307692 0.88461538
0.86538462 0.94230769 0.90384615 0.86538462]
mean value: 0.8966255442670538
key: train_accuracy
value: [0.89765458 0.90405117 0.90212766 0.89148936 0.89574468 0.9
0.9 0.89361702 0.89787234 0.90212766]
mean value: 0.8984684480333893
key: test_fscore
value: [0.90566038 0.85714286 0.85106383 0.96153846 0.92592593 0.875
0.86792453 0.94117647 0.90909091 0.87272727]
mean value: 0.8967250632461273
key: train_fscore
value: [0.89787234 0.90364026 0.90336134 0.89171975 0.89552239 0.90105263
0.90021231 0.8940678 0.8974359 0.9017094 ]
mean value: 0.8986594116764762
key: test_precision
value: [0.88888889 0.82758621 0.95238095 0.96153846 0.89285714 0.95454545
0.85185185 0.96 0.86206897 0.82758621]
mean value: 0.8979304131373097
key: train_precision
value: [0.89787234 0.9055794 0.89211618 0.88983051 0.8974359 0.89166667
0.89830508 0.89029536 0.90128755 0.9055794 ]
mean value: 0.8969968390902169
key: test_recall
value: [0.92307692 0.88888889 0.76923077 0.96153846 0.96153846 0.80769231
0.88461538 0.92307692 0.96153846 0.92307692]
mean value: 0.9004273504273504
key: train_recall
value: [0.89787234 0.9017094 0.91489362 0.89361702 0.89361702 0.9106383
0.90212766 0.89787234 0.89361702 0.89787234]
mean value: 0.9003837061283869
key: test_roc_auc
value: [0.90598291 0.8482906 0.86538462 0.96153846 0.92307692 0.88461538
0.86538462 0.94230769 0.90384615 0.86538462]
mean value: 0.8965811965811966
key: train_roc_auc
value: [0.89765412 0.90404619 0.90212766 0.89148936 0.89574468 0.9
0.9 0.89361702 0.89787234 0.90212766]
mean value: 0.8984679032551373
key: test_jcc
value: [0.82758621 0.75 0.74074074 0.92592593 0.86206897 0.77777778
0.76666667 0.88888889 0.83333333 0.77419355]
mean value: 0.8147182054134223
key: train_jcc
value: [0.81467181 0.82421875 0.82375479 0.8045977 0.81081081 0.81992337
0.81853282 0.80842912 0.81395349 0.82101167]
mean value: 0.81599043363822
MCC on Blind test: 0.67
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.43118334 1.92359424 1.93823981 1.90809631 1.79002547 0.74983549
1.94170189 1.90422225 1.8327477 1.80478406]
mean value: 1.7224430561065673
key: score_time
value: [0.0124011 0.01241326 0.01531434 0.02261472 0.01246166 0.01324511
0.01439619 0.02267289 0.02315784 0.0149169 ]
mean value: 0.016359400749206544
key: test_mcc
value: [0.73646724 0.8116984 0.77849894 0.96225045 0.85634884 0.77849894
0.80829038 0.84615385 0.84615385 0.82305489]
mean value: 0.8247415773739302
key: train_mcc
value: [0.98297841 0.99147118 0.99148936 0.99152527 0.9873145 0.91084449
0.9957537 0.99148936 0.97894501 0.9957537 ]
mean value: 0.981756497283519
key: test_accuracy
value: [0.86792453 0.90566038 0.88461538 0.98076923 0.92307692 0.88461538
0.90384615 0.92307692 0.92307692 0.90384615]
mean value: 0.9100507982583455
key: train_accuracy
value: [0.99147122 0.99573561 0.99574468 0.99574468 0.99361702 0.95531915
0.99787234 0.99574468 0.9893617 0.99787234]
mean value: 0.9908483418772399
key: test_fscore
value: [0.86792453 0.90909091 0.875 0.98113208 0.92857143 0.875
0.90196078 0.92307692 0.92307692 0.9122807 ]
mean value: 0.909711427365788
key: train_fscore
value: [0.99145299 0.9957265 0.99574468 0.9957265 0.99357602 0.95578947
0.9978678 0.99574468 0.98924731 0.99787686]
mean value: 0.9908752808838321
key: test_precision
value: [0.85185185 0.89285714 0.95454545 0.96296296 0.86666667 0.95454545
0.92 0.92307692 0.92307692 0.83870968]
mean value: 0.9088293057002734
key: train_precision
value: [0.99570815 0.9957265 0.99574468 1. 1. 0.94583333
1. 0.99574468 1. 0.99576271]
mean value: 0.9924520057132802
key: test_recall
value: [0.88461538 0.92592593 0.80769231 1. 1. 0.80769231
0.88461538 0.92307692 0.92307692 1. ]
mean value: 0.9156695156695157
key: train_recall
value: [0.98723404 0.9957265 0.99574468 0.99148936 0.98723404 0.96595745
0.99574468 0.99574468 0.9787234 1. ]
mean value: 0.9893598836152028
key: test_roc_auc
value: [0.86823362 0.90527066 0.88461538 0.98076923 0.92307692 0.88461538
0.90384615 0.92307692 0.92307692 0.90384615]
mean value: 0.9100427350427351
key: train_roc_auc
value: [0.99148027 0.99573559 0.99574468 0.99574468 0.99361702 0.95531915
0.99787234 0.99574468 0.9893617 0.99787234]
mean value: 0.9908492453173304
key: test_jcc
value: [0.76666667 0.83333333 0.77777778 0.96296296 0.86666667 0.77777778
0.82142857 0.85714286 0.85714286 0.83870968]
mean value: 0.8359609148318826
key: train_jcc
value: [0.98305085 0.99148936 0.99152542 0.99148936 0.98723404 0.91532258
0.99574468 0.99152542 0.9787234 0.99576271]
mean value: 0.9821867838488652
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0287106 0.02211738 0.02123046 0.02213478 0.01955199 0.02208471
0.0208869 0.02009106 0.02331758 0.02205229]
mean value: 0.022217774391174318
key: score_time
value: [0.01214767 0.00964355 0.00877619 0.00889492 0.00886869 0.00891852
0.00902438 0.00888228 0.00897551 0.0089376 ]
mean value: 0.009306931495666504
key: test_mcc
value: [0.81688878 0.92704716 0.92307692 0.88527041 0.84866842 0.96225045
0.84615385 0.84866842 0.77151675 1. ]
mean value: 0.8829541177251579
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90566038 0.96226415 0.96153846 0.94230769 0.92307692 0.98076923
0.92307692 0.92307692 0.88461538 1. ]
mean value: 0.9406386066763426
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.96428571 0.96153846 0.94339623 0.92592593 0.98039216
0.92307692 0.92592593 0.88888889 1. ]
mean value: 0.9422521132010588
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86206897 0.93103448 0.96153846 0.92592593 0.89285714 1.
0.92307692 0.89285714 0.85714286 1. ]
mean value: 0.9246501901674316
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 0.96153846 0.96153846 0.96153846
0.92307692 0.96153846 0.92307692 1. ]
mean value: 0.9615384615384616
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90669516 0.96153846 0.96153846 0.94230769 0.92307692 0.98076923
0.92307692 0.92307692 0.88461538 1. ]
mean value: 0.9406695156695157
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.93103448 0.92592593 0.89285714 0.86206897 0.96153846
0.85714286 0.86206897 0.8 1. ]
mean value: 0.8925970134590824
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13061213 0.12142134 0.12343693 0.12211251 0.12081861 0.12108946
0.119946 0.12066436 0.12075686 0.12019539]
mean value: 0.12210536003112793
key: score_time
value: [0.01764417 0.01821375 0.01812148 0.01800776 0.01792765 0.01793003
0.01794624 0.01797581 0.01795745 0.01803088]
mean value: 0.017975521087646485
key: test_mcc
value: [0.74106548 0.70042867 0.81312325 0.88527041 0.89056356 0.84866842
0.76923077 0.88527041 0.89056356 0.66628253]
mean value: 0.8090467064561273
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86792453 0.8490566 0.90384615 0.94230769 0.94230769 0.92307692
0.88461538 0.94230769 0.94230769 0.82692308]
mean value: 0.902467343976778
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.87272727 0.84615385 0.89795918 0.94117647 0.94545455 0.92
0.88461538 0.94117647 0.94545455 0.84210526]
mean value: 0.9036822982413429
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82758621 0.88 0.95652174 0.96 0.89655172 0.95833333
0.88461538 0.96 0.89655172 0.77419355]
mean value: 0.8994353660638663
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.81481481 0.84615385 0.92307692 1. 0.88461538
0.88461538 0.92307692 1. 0.92307692]
mean value: 0.9122507122507123
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86894587 0.8497151 0.90384615 0.94230769 0.94230769 0.92307692
0.88461538 0.94230769 0.94230769 0.82692308]
mean value: 0.9026353276353276
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.77419355 0.73333333 0.81481481 0.88888889 0.89655172 0.85185185
0.79310345 0.88888889 0.89655172 0.72727273]
mean value: 0.8265450949989326
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01057506 0.01090622 0.01024985 0.0104661 0.01062894 0.01125455
0.01053858 0.01054215 0.01159382 0.01146913]
mean value: 0.010822439193725586
key: score_time
value: [0.00924444 0.00917506 0.00940752 0.00961065 0.00959969 0.00898838
0.00904393 0.00929976 0.00917506 0.00896525]
mean value: 0.009250974655151368
key: test_mcc
value: [0.43536101 0.53035501 0.35273781 0.66628253 0.73568294 0.69230769
0.58789635 0.73131034 0.69230769 0.63245553]
mean value: 0.605669690697596
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71698113 0.75471698 0.67307692 0.82692308 0.86538462 0.84615385
0.78846154 0.86538462 0.84615385 0.80769231]
mean value: 0.7990928882438316
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.69387755 0.72340426 0.63829787 0.80851064 0.85714286 0.84615385
0.76595745 0.8627451 0.84615385 0.82758621]
mean value: 0.7869829618172682
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.73913043 0.85 0.71428571 0.9047619 0.91304348 0.84615385
0.85714286 0.88 0.84615385 0.75 ]
mean value: 0.8300672081541647
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.65384615 0.62962963 0.57692308 0.73076923 0.80769231 0.84615385
0.69230769 0.84615385 0.84615385 0.92307692]
mean value: 0.7552706552706553
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71581197 0.75712251 0.67307692 0.82692308 0.86538462 0.84615385
0.78846154 0.86538462 0.84615385 0.80769231]
mean value: 0.7992165242165242
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53125 0.56666667 0.46875 0.67857143 0.75 0.73333333
0.62068966 0.75862069 0.73333333 0.70588235]
mean value: 0.6547097459673524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.82980156 1.81833506 1.86908293 1.83907104 1.79196358 1.76676679
1.80318642 1.84923625 1.91046953 1.88992739]
mean value: 1.836784052848816
key: score_time
value: [0.10081434 0.10086036 0.09425735 0.09559989 0.09269404 0.09352255
0.09992027 0.10102367 0.10104275 0.10122085]
mean value: 0.09809560775756836
key: test_mcc
value: [0.81688878 0.92450142 0.9258201 0.92307692 0.9258201 0.9258201
0.89056356 0.96225045 0.9258201 0.9258201 ]
mean value: 0.914638163414845
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90566038 0.96226415 0.96153846 0.96153846 0.96153846 0.96153846
0.94230769 0.98076923 0.96153846 0.96153846]
mean value: 0.9560232220609579
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.96296296 0.96 0.96153846 0.96296296 0.96
0.93877551 0.98039216 0.96296296 0.96296296]
mean value: 0.9561648889548049
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86206897 0.96296296 1. 0.96153846 0.92857143 1.
1. 1. 0.92857143 0.92857143]
mean value: 0.9572284675732952
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 0.96296296 0.92307692 0.96153846 1. 0.92307692
0.88461538 0.96153846 1. 1. ]
mean value: 0.9578347578347578
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90669516 0.96225071 0.96153846 0.96153846 0.96153846 0.96153846
0.94230769 0.98076923 0.96153846 0.96153846]
mean value: 0.9561253561253562
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.92857143 0.92307692 0.92592593 0.92857143 0.92307692
0.88461538 0.96153846 0.92857143 0.92857143]
mean value: 0.9165852665852666
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.00343037 0.91479254 0.98317862 0.97180915 0.9955368 1.05391216
0.92968369 0.95286179 1.04950261 0.98119926]
mean value: 0.9835906982421875
key: score_time
value: [0.20472693 0.27165556 0.25223088 0.27607179 0.22573662 0.22320628
0.12021565 0.27948952 0.23237991 0.22426629]
mean value: 0.23099794387817382
key: test_mcc
value: [0.81688878 0.77350427 0.9258201 0.92307692 0.9258201 0.88527041
0.89056356 0.96225045 0.9258201 0.88527041]
mean value: 0.891428510912105
key: train_mcc
value: [0.96588471 0.95309971 0.95744681 0.95320012 0.95748148 0.95320012
0.95744681 0.95320012 0.94893617 0.95320012]
mean value: 0.955309616755026
key: test_accuracy
value: [0.90566038 0.88679245 0.96153846 0.96153846 0.96153846 0.94230769
0.94230769 0.98076923 0.96153846 0.94230769]
mean value: 0.9446298984034833
key: train_accuracy
value: [0.98294243 0.97654584 0.9787234 0.97659574 0.9787234 0.97659574
0.9787234 0.97659574 0.97446809 0.97659574]
mean value: 0.9776509549516853
key: test_fscore
value: [0.90909091 0.88888889 0.96 0.96153846 0.96296296 0.94117647
0.93877551 0.98039216 0.96296296 0.94339623]
mean value: 0.9449184549514342
key: train_fscore
value: [0.98297872 0.9764454 0.9787234 0.97654584 0.97863248 0.97654584
0.9787234 0.97654584 0.97446809 0.97654584]
mean value: 0.9776154860669302
key: test_precision
value: [0.86206897 0.88888889 1. 0.96153846 0.92857143 0.96
1. 1. 0.92857143 0.92592593]
mean value: 0.9455565099013374
key: train_precision
value: [0.98297872 0.97854077 0.9787234 0.97863248 0.98283262 0.97863248
0.9787234 0.97863248 0.97446809 0.97863248]
mean value: 0.9790796922109131
key: test_recall
value: [0.96153846 0.88888889 0.92307692 0.96153846 1. 0.92307692
0.88461538 0.96153846 1. 0.96153846]
mean value: 0.9465811965811965
key: train_recall
value: [0.98297872 0.97435897 0.9787234 0.97446809 0.97446809 0.97446809
0.9787234 0.97446809 0.97446809 0.97446809]
mean value: 0.9761593016912166
key: test_roc_auc
value: [0.90669516 0.88675214 0.96153846 0.96153846 0.96153846 0.94230769
0.94230769 0.98076923 0.96153846 0.94230769]
mean value: 0.9447293447293448
key: train_roc_auc
value: [0.98294235 0.97654119 0.9787234 0.97659574 0.9787234 0.97659574
0.9787234 0.97659574 0.97446809 0.97659574]
mean value: 0.977650481905801
key: test_jcc
value: [0.83333333 0.8 0.92307692 0.92592593 0.92857143 0.88888889
0.88461538 0.96153846 0.92857143 0.89285714]
mean value: 0.8967378917378918
key: train_jcc
value: [0.9665272 0.9539749 0.95833333 0.95416667 0.958159 0.95416667
0.95833333 0.95416667 0.95020747 0.95416667]
mean value: 0.9562201890079111
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02466607 0.01180482 0.01069355 0.01015139 0.01118636 0.01110482
0.01029754 0.01127911 0.01162481 0.0100522 ]
mean value: 0.012286067008972168
key: score_time
value: [0.01343274 0.01010823 0.00889921 0.00962663 0.00955248 0.00959492
0.00926352 0.00954437 0.01042676 0.00954938]
mean value: 0.009999823570251466
key: test_mcc
value: [0.73646724 0.50997151 0.6172134 0.88527041 0.69436507 0.69230769
0.69436507 0.77151675 0.69436507 0.69230769]
mean value: 0.6988149917958625
key: train_mcc
value: [0.74840423 0.76129503 0.71066404 0.74894295 0.75778307 0.77046393
0.71925314 0.68550371 0.74075423 0.75330062]
mean value: 0.739636494142086
key: test_accuracy
value: [0.86792453 0.75471698 0.80769231 0.94230769 0.84615385 0.84615385
0.84615385 0.88461538 0.84615385 0.84615385]
mean value: 0.8488026124818577
key: train_accuracy
value: [0.87420043 0.88059701 0.85531915 0.87446809 0.8787234 0.88510638
0.85957447 0.84255319 0.87021277 0.87659574]
mean value: 0.8697350632853967
key: test_fscore
value: [0.86792453 0.75471698 0.8 0.94117647 0.85185185 0.84615385
0.84 0.88 0.85185185 0.84615385]
mean value: 0.8479829376033594
key: train_fscore
value: [0.87473461 0.87931034 0.85470085 0.87473461 0.88050314 0.88655462
0.8583691 0.83982684 0.86825054 0.87553648]
mean value: 0.869252113965142
key: test_precision
value: [0.85185185 0.76923077 0.83333333 0.96 0.82142857 0.84615385
0.875 0.91666667 0.82142857 0.84615385]
mean value: 0.8541247456247456
key: train_precision
value: [0.87288136 0.88695652 0.8583691 0.87288136 0.8677686 0.87551867
0.86580087 0.85462555 0.88157895 0.88311688]
mean value: 0.8719497846503439
key: test_recall
value: [0.88461538 0.74074074 0.76923077 0.92307692 0.88461538 0.84615385
0.80769231 0.84615385 0.88461538 0.84615385]
mean value: 0.8433048433048433
key: train_recall
value: [0.87659574 0.87179487 0.85106383 0.87659574 0.89361702 0.89787234
0.85106383 0.82553191 0.85531915 0.86808511]
mean value: 0.8667539552645935
key: test_roc_auc
value: [0.86823362 0.75498575 0.80769231 0.94230769 0.84615385 0.84615385
0.84615385 0.88461538 0.84615385 0.84615385]
mean value: 0.8488603988603989
key: train_roc_auc
value: [0.87419531 0.88057829 0.85531915 0.87446809 0.8787234 0.88510638
0.85957447 0.84255319 0.87021277 0.87659574]
mean value: 0.8697326786688488
key: test_jcc
value: [0.76666667 0.60606061 0.66666667 0.88888889 0.74193548 0.73333333
0.72413793 0.78571429 0.74193548 0.73333333]
mean value: 0.7388672679440199
key: train_jcc
value: [0.77735849 0.78461538 0.74626866 0.77735849 0.78651685 0.79622642
0.7518797 0.7238806 0.76717557 0.77862595]
mean value: 0.7689906114471405
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.0810225 0.0706861 0.15280747 0.10494733 0.06948256 0.08076954
0.07875824 0.08115697 0.0695591 0.0707655 ]
mean value: 0.08599553108215333
key: score_time
value: [0.01104665 0.010849 0.01349568 0.01159859 0.01075864 0.01105809
0.01134682 0.01259756 0.01232362 0.01064587]
mean value: 0.011572051048278808
key: test_mcc
value: [0.85164138 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045
0.84866842 0.9258201 0.9258201 0.96225045]
mean value: 0.9289683006952936
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923
0.92307692 0.96153846 0.96153846 0.98076923]
mean value: 0.9636429608127721
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216
0.92 0.96 0.96296296 0.98113208]
mean value: 0.963671849833892
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1.
0.95833333 1. 0.92857143 0.96296296]
mean value: 0.9598544973544973
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.88461538 0.92307692 1. 1. ]
mean value: 0.9692307692307692
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923
0.92307692 0.96153846 0.96153846 0.98076923]
mean value: 0.9636752136752137
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846
0.85185185 0.92307692 0.92857143 0.96296296]
mean value: 0.9307429160877436
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03567243 0.04290986 0.0472014 0.07730103 0.06250048 0.05814052
0.0723269 0.04689837 0.08232975 0.05255485]
mean value: 0.05778355598449707
key: score_time
value: [0.01217079 0.01232409 0.01893544 0.01794481 0.01220798 0.01889324
0.01251721 0.01215696 0.01258445 0.0193646 ]
mean value: 0.014909958839416504
key: test_mcc
value: [0.73646724 0.73997003 0.77849894 0.88527041 0.89056356 0.77849894
0.57735027 0.73131034 0.74466871 0.71151247]
mean value: 0.7574110914556645
key: train_mcc
value: [0.89794254 0.89379475 0.91542421 0.90220118 0.91922384 0.91084449
0.91084449 0.91922384 0.91935705 0.90641581]
mean value: 0.909527219993544
key: test_accuracy
value: [0.86792453 0.86792453 0.88461538 0.94230769 0.94230769 0.88461538
0.78846154 0.86538462 0.86538462 0.84615385]
mean value: 0.8755079825834543
key: train_accuracy
value: [0.94882729 0.9466951 0.95744681 0.95106383 0.95957447 0.95531915
0.95531915 0.95957447 0.95957447 0.95319149]
mean value: 0.9546586217846935
key: test_fscore
value: [0.86792453 0.87719298 0.875 0.94339623 0.94545455 0.875
0.79245283 0.8627451 0.87719298 0.86206897]
mean value: 0.8778428158828944
key: train_fscore
value: [0.94957983 0.94736842 0.958159 0.95137421 0.95983087 0.95578947
0.95578947 0.95983087 0.96 0.95338983]
mean value: 0.9551111967481583
key: test_precision
value: [0.85185185 0.83333333 0.95454545 0.92592593 0.89655172 0.95454545
0.77777778 0.88 0.80645161 0.78125 ]
mean value: 0.8662233135020955
key: train_precision
value: [0.93775934 0.93360996 0.94238683 0.94537815 0.95378151 0.94583333
0.94583333 0.95378151 0.95 0.94936709]
mean value: 0.945773105762638
key: test_recall
value: [0.88461538 0.92592593 0.80769231 0.96153846 1. 0.80769231
0.80769231 0.84615385 0.96153846 0.96153846]
mean value: 0.8964387464387464
key: train_recall
value: [0.96170213 0.96153846 0.97446809 0.95744681 0.96595745 0.96595745
0.96595745 0.96595745 0.97021277 0.95744681]
mean value: 0.9646644844517185
key: test_roc_auc
value: [0.86823362 0.86680912 0.88461538 0.94230769 0.94230769 0.88461538
0.78846154 0.86538462 0.86538462 0.84615385]
mean value: 0.8754273504273504
key: train_roc_auc
value: [0.94879978 0.94672668 0.95744681 0.95106383 0.95957447 0.95531915
0.95531915 0.95957447 0.95957447 0.95319149]
mean value: 0.9546590289143482
key: test_jcc
value: [0.76666667 0.78125 0.77777778 0.89285714 0.89655172 0.77777778
0.65625 0.75862069 0.78125 0.75757576]
mean value: 0.7846577536448226
key: train_jcc
value: [0.904 0.9 0.91967871 0.90725806 0.92276423 0.91532258
0.91532258 0.92276423 0.92307692 0.91093117]
mean value: 0.9141118493116434
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.0105176 0.01079559 0.01000309 0.00996852 0.00984311 0.00979662
0.01083136 0.00997591 0.00995231 0.01000428]
mean value: 0.010168838500976562
key: score_time
value: [0.01233459 0.0136776 0.00898337 0.00869012 0.00872898 0.0086844
0.00921631 0.00893378 0.00907898 0.00953197]
mean value: 0.009786009788513184
key: test_mcc
value: [0.6980057 0.51359557 0.73568294 0.84615385 0.84615385 0.73568294
0.65824263 0.65433031 0.73568294 0.6172134 ]
mean value: 0.704074411237807
key: train_mcc
value: [0.68485508 0.7273009 0.70669657 0.70654292 0.74910575 0.75745367
0.69818215 0.6730782 0.69863813 0.71087004]
mean value: 0.7112723392902409
key: test_accuracy
value: [0.8490566 0.75471698 0.86538462 0.92307692 0.92307692 0.86538462
0.82692308 0.82692308 0.86538462 0.80769231]
mean value: 0.8507619738751815
key: train_accuracy
value: [0.84221748 0.86353945 0.85319149 0.85319149 0.87446809 0.8787234
0.84893617 0.83617021 0.84893617 0.85531915]
mean value: 0.8554693099850292
key: test_fscore
value: [0.84615385 0.74509804 0.85714286 0.92307692 0.92307692 0.85714286
0.81632653 0.82352941 0.87272727 0.8 ]
mean value: 0.8464274660913317
key: train_fscore
value: [0.83982684 0.86147186 0.85097192 0.8516129 0.87311828 0.87846482
0.84665227 0.83224401 0.8453159 0.85344828]
mean value: 0.8533127081638621
key: test_precision
value: [0.84615385 0.79166667 0.91304348 0.92307692 0.92307692 0.91304348
0.86956522 0.84 0.82758621 0.83333333]
mean value: 0.8680546073117288
key: train_precision
value: [0.85462555 0.87280702 0.86403509 0.86086957 0.8826087 0.88034188
0.85964912 0.85267857 0.86607143 0.86462882]
mean value: 0.8658315740903113
key: test_recall
value: [0.84615385 0.7037037 0.80769231 0.92307692 0.92307692 0.80769231
0.76923077 0.80769231 0.92307692 0.76923077]
mean value: 0.8280626780626781
key: train_recall
value: [0.82553191 0.85042735 0.83829787 0.84255319 0.86382979 0.87659574
0.83404255 0.81276596 0.82553191 0.84255319]
mean value: 0.8412129478086925
key: test_roc_auc
value: [0.84900285 0.75569801 0.86538462 0.92307692 0.92307692 0.86538462
0.82692308 0.82692308 0.86538462 0.80769231]
mean value: 0.8508547008547008
key: train_roc_auc
value: [0.84225314 0.86351155 0.85319149 0.85319149 0.87446809 0.8787234
0.84893617 0.83617021 0.84893617 0.85531915]
mean value: 0.8554700854700855
key: test_jcc
value: [0.73333333 0.59375 0.75 0.85714286 0.85714286 0.75
0.68965517 0.7 0.77419355 0.66666667]
mean value: 0.7371884435086604
key: train_jcc
value: [0.7238806 0.75665399 0.7406015 0.74157303 0.77480916 0.78326996
0.7340824 0.71268657 0.73207547 0.7443609 ]
mean value: 0.7443993587281833
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01451564 0.01753974 0.02073836 0.02360868 0.0248158 0.01841426
0.02023935 0.01849508 0.02454448 0.02476811]
mean value: 0.020767951011657716
key: score_time
value: [0.01012969 0.01133466 0.01186538 0.01195502 0.01203322 0.01204181
0.01217699 0.0120542 0.01200485 0.01193643]
mean value: 0.011753225326538086
key: test_mcc
value: [0.77350427 0.70527596 0.74466871 0.88527041 0.85634884 0.6789146
0.77849894 0.79056942 0.88527041 0.69436507]
mean value: 0.7792686643983939
key: train_mcc
value: [0.87814682 0.88584735 0.88164966 0.92424143 0.93221879 0.84593758
0.87157206 0.76874221 0.86448019 0.85856681]
mean value: 0.8711402909907915
key: test_accuracy
value: [0.88679245 0.8490566 0.86538462 0.94230769 0.92307692 0.82692308
0.88461538 0.88461538 0.94230769 0.84615385]
mean value: 0.8851233671988389
key: train_accuracy
value: [0.93816631 0.9424307 0.94042553 0.96170213 0.96595745 0.9212766
0.93404255 0.87446809 0.92978723 0.92553191]
mean value: 0.9333788504287075
key: test_fscore
value: [0.88461538 0.86206897 0.85106383 0.94117647 0.92857143 0.8
0.875 0.86956522 0.94117647 0.84 ]
mean value: 0.8793237767059063
key: train_fscore
value: [0.93626374 0.94363257 0.93913043 0.96086957 0.96551724 0.91759465
0.93095768 0.85851319 0.9258427 0.92027335]
mean value: 0.9298595118619817
key: test_precision
value: [0.88461538 0.80645161 0.95238095 0.96 0.86666667 0.94736842
0.95454545 1. 0.96 0.875 ]
mean value: 0.9207028492164315
key: train_precision
value: [0.96818182 0.92244898 0.96 0.98222222 0.97816594 0.96261682
0.97663551 0.98351648 0.98095238 0.99019608]
mean value: 0.9704936238209341
key: test_recall
value: [0.88461538 0.92592593 0.76923077 0.92307692 1. 0.69230769
0.80769231 0.76923077 0.92307692 0.80769231]
mean value: 0.8502849002849003
key: train_recall
value: [0.90638298 0.96581197 0.91914894 0.94042553 0.95319149 0.87659574
0.8893617 0.76170213 0.87659574 0.85957447]
mean value: 0.8948790689216222
key: test_roc_auc
value: [0.88675214 0.84757835 0.86538462 0.94230769 0.92307692 0.82692308
0.88461538 0.88461538 0.94230769 0.84615385]
mean value: 0.88497150997151
key: train_roc_auc
value: [0.93823422 0.94248045 0.94042553 0.96170213 0.96595745 0.9212766
0.93404255 0.87446809 0.92978723 0.92553191]
mean value: 0.9333906164757229
key: test_jcc
value: [0.79310345 0.75757576 0.74074074 0.88888889 0.86666667 0.66666667
0.77777778 0.76923077 0.88888889 0.72413793]
mean value: 0.7873677535746502
key: train_jcc
value: [0.88016529 0.89328063 0.8852459 0.92468619 0.93333333 0.84773663
0.87083333 0.75210084 0.86192469 0.85232068]
mean value: 0.8701627509590387
MCC on Blind test: 0.71
Accuracy on Blind test: 0.83
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02051973 0.02308512 0.02264977 0.02195239 0.01951814 0.01922989
0.01749086 0.02073479 0.02035427 0.01736856]
mean value: 0.020290350914001463
key: score_time
value: [0.01103067 0.01198816 0.01198959 0.0119803 0.0121727 0.01211762
0.01197362 0.01196384 0.01207113 0.01193643]
mean value: 0.011922407150268554
key: test_mcc
value: [0.81688878 0.68308228 0.80829038 0.75878691 0.84615385 0.81312325
0.6789146 0.84866842 0.84866842 0.74466871]
mean value: 0.7847245604589297
key: train_mcc
value: [0.82318874 0.80844901 0.89143025 0.73855496 0.85379422 0.90278998
0.67317249 0.89094414 0.89427309 0.82331429]
mean value: 0.8299911175592818
key: test_accuracy
value: [0.90566038 0.83018868 0.90384615 0.86538462 0.92307692 0.90384615
0.82692308 0.92307692 0.92307692 0.86538462]
mean value: 0.8870464441219158
key: train_accuracy
value: [0.90618337 0.89765458 0.94468085 0.85531915 0.92340426 0.95106383
0.81489362 0.94468085 0.94680851 0.90851064]
mean value: 0.9093199655219344
key: test_fscore
value: [0.90909091 0.85245902 0.90566038 0.88135593 0.92307692 0.89795918
0.8 0.92 0.92592593 0.85106383]
mean value: 0.8866592097509784
key: train_fscore
value: [0.91338583 0.90588235 0.94650206 0.87265918 0.91818182 0.9519833
0.7751938 0.94298246 0.94780793 0.90249433]
mean value: 0.9077073048926279
key: test_precision
value: [0.86206897 0.76470588 0.88888889 0.78787879 0.92307692 0.95652174
0.94736842 0.95833333 0.89285714 0.95238095]
mean value: 0.8934081036469277
key: train_precision
value: [0.84981685 0.83695652 0.91633466 0.77926421 0.98536585 0.93442623
0.98684211 0.97285068 0.93032787 0.96601942]
mean value: 0.9158204400448495
key: test_recall
value: [0.96153846 0.96296296 0.92307692 1. 0.92307692 0.84615385
0.69230769 0.88461538 0.96153846 0.76923077]
mean value: 0.8924501424501424
key: train_recall
value: [0.98723404 0.98717949 0.9787234 0.99148936 0.85957447 0.97021277
0.63829787 0.91489362 0.96595745 0.84680851]
mean value: 0.9140370976541189
key: test_roc_auc
value: [0.90669516 0.82763533 0.90384615 0.86538462 0.92307692 0.90384615
0.82692308 0.92307692 0.92307692 0.86538462]
mean value: 0.8868945868945869
key: train_roc_auc
value: [0.90601018 0.89784506 0.94468085 0.85531915 0.92340426 0.95106383
0.81489362 0.94468085 0.94680851 0.90851064]
mean value: 0.9093216948536097
key: test_jcc
value: [0.83333333 0.74285714 0.82758621 0.78787879 0.85714286 0.81481481
0.66666667 0.85185185 0.86206897 0.74074074]
mean value: 0.7984941367699988
key: train_jcc
value: [0.84057971 0.82795699 0.8984375 0.77408638 0.8487395 0.90836653
0.63291139 0.89211618 0.90079365 0.82231405]
mean value: 0.8346301883150747
MCC on Blind test: 0.69
Accuracy on Blind test: 0.83
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18447971 0.18334126 0.18150234 0.1833899 0.18306136 0.18128395
0.18111897 0.17938328 0.18043876 0.18003964]
mean value: 0.18180391788482667
key: score_time
value: [0.015414 0.01569867 0.0158639 0.01544142 0.01620245 0.01581216
0.01540232 0.01535916 0.01594925 0.01538944]
mean value: 0.015653276443481447
key: test_mcc
value: [0.85164138 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045
0.81312325 0.96225045 0.9258201 0.92307692]
mean value: 0.9251394652761287
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923
0.90384615 0.98076923 0.96153846 0.96153846]
mean value: 0.9617198838896952
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216
0.89795918 0.98039216 0.96296296 0.96153846]
mean value: 0.9615476224941898
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1.
0.95652174 1. 0.92857143 0.96153846]
mean value: 0.9595308877917573
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.84615385 0.96153846 1. 0.96153846]
mean value: 0.9653846153846154
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923
0.90384615 0.98076923 0.96153846 0.96153846]
mean value: 0.9617521367521368
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846
0.81481481 0.96153846 0.92857143 0.92592593]
mean value: 0.9271816625264901
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06588101 0.05572891 0.06235313 0.06345272 0.08077598 0.07156444
0.08671784 0.07806897 0.08278823 0.08851504]
mean value: 0.07358462810516357
key: score_time
value: [0.02843833 0.02826238 0.02912283 0.02685213 0.03997207 0.02721906
0.03725529 0.02411723 0.03921318 0.03339529]
mean value: 0.03138477802276611
key: test_mcc
value: [0.85164138 0.92450142 0.9258201 0.96225045 0.9258201 0.96225045
0.84866842 0.88527041 0.89056356 0.96225045]
mean value: 0.9139036744202026
key: train_mcc
value: [0.98721586 0.98721563 0.98312115 0.97873227 0.9957537 0.9873145
0.98724298 0.99152527 0.97478586 0.98297872]
mean value: 0.9855885925245994
key: test_accuracy
value: [0.9245283 0.96226415 0.96153846 0.98076923 0.96153846 0.98076923
0.92307692 0.94230769 0.94230769 0.98076923]
mean value: 0.9559869375907112
key: train_accuracy
value: [0.99360341 0.99360341 0.99148936 0.9893617 0.99787234 0.99361702
0.99361702 0.99574468 0.98723404 0.99148936]
mean value: 0.9927632354942613
key: test_fscore
value: [0.92592593 0.96296296 0.96 0.98113208 0.96296296 0.98039216
0.92 0.94339623 0.94545455 0.98113208]
mean value: 0.9563358931527632
key: train_fscore
value: [0.99360341 0.99357602 0.99141631 0.98933902 0.9978678 0.99357602
0.99363057 0.9957265 0.98739496 0.99148936]
mean value: 0.9927619966475919
key: test_precision
value: [0.89285714 0.96296296 1. 0.96296296 0.92857143 1.
0.95833333 0.92592593 0.89655172 0.96296296]
mean value: 0.949112844371465
key: train_precision
value: [0.9957265 0.99570815 1. 0.99145299 1. 1.
0.99152542 1. 0.97510373 0.99148936]
mean value: 0.99410061615567
key: test_recall
value: [0.96153846 0.96296296 0.92307692 1. 1. 0.96153846
0.88461538 0.96153846 1. 1. ]
mean value: 0.9655270655270656
key: train_recall
value: [0.99148936 0.99145299 0.98297872 0.98723404 0.99574468 0.98723404
0.99574468 0.99148936 1. 0.99148936]
mean value: 0.9914857246772141
key: test_roc_auc
value: [0.92521368 0.96225071 0.96153846 0.98076923 0.96153846 0.98076923
0.92307692 0.94230769 0.94230769 0.98076923]
mean value: 0.9560541310541311
key: train_roc_auc
value: [0.99360793 0.99359884 0.99148936 0.9893617 0.99787234 0.99361702
0.99361702 0.99574468 0.98723404 0.99148936]
mean value: 0.9927632296781232
key: test_jcc
value: [0.86206897 0.92857143 0.92307692 0.96296296 0.92857143 0.96153846
0.85185185 0.89285714 0.89655172 0.96296296]
mean value: 0.9171013852048335
key: train_jcc
value: [0.98728814 0.98723404 0.98297872 0.97890295 0.99574468 0.98723404
0.98734177 0.99148936 0.97510373 0.98312236]
mean value: 0.9856439809704479
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.14703774 0.11411095 0.19120908 0.13331914 0.13890243 0.21100497
0.18627715 0.17440438 0.17817092 0.1670382 ]
mean value: 0.1641474962234497
key: score_time
value: [0.0247128 0.0149591 0.02889299 0.01530313 0.01501536 0.02427626
0.02445054 0.0240407 0.02411985 0.02399731]
mean value: 0.021976804733276366
key: test_mcc
value: [0.73646724 0.50997151 0.50336201 0.77151675 0.6789146 0.65433031
0.69436507 0.81312325 0.81312325 0.53846154]
mean value: 0.6713635523699482
key: train_mcc
value: [0.98728791 0.99150708 0.9873145 0.9873145 0.9873145 0.99152527
0.9873145 0.9873145 0.9873145 0.9873145 ]
mean value: 0.9881521740698564
key: test_accuracy
value: [0.86792453 0.75471698 0.75 0.88461538 0.82692308 0.82692308
0.84615385 0.90384615 0.90384615 0.76923077]
mean value: 0.8334179970972424
key: train_accuracy
value: [0.99360341 0.99573561 0.99361702 0.99361702 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940402848977
key: test_fscore
value: [0.86792453 0.75471698 0.73469388 0.88 0.84745763 0.82352941
0.84 0.89795918 0.90909091 0.76923077]
mean value: 0.832460328786348
key: train_fscore
value: [0.99357602 0.99570815 0.99357602 0.99357602 0.99357602 0.9957265
0.99357602 0.99357602 0.99357602 0.99357602]
mean value: 0.9940042787277901
key: test_precision
value: [0.85185185 0.76923077 0.7826087 0.91666667 0.75757576 0.84
0.875 0.95652174 0.86206897 0.76923077]
mean value: 0.8380755214855664
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88461538 0.74074074 0.69230769 0.84615385 0.96153846 0.80769231
0.80769231 0.84615385 0.96153846 0.76923077]
mean value: 0.8317663817663817
key: train_recall
value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
key: test_roc_auc
value: [0.86823362 0.75498575 0.75 0.88461538 0.82692308 0.82692308
0.84615385 0.90384615 0.90384615 0.76923077]
mean value: 0.8334757834757835
key: train_roc_auc
value: [0.99361702 0.9957265 0.99361702 0.99361702 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940407346790325
key: test_jcc
value: [0.76666667 0.60606061 0.58064516 0.78571429 0.73529412 0.7
0.72413793 0.81481481 0.83333333 0.625 ]
mean value: 0.7171666916561571
key: train_jcc
value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
MCC on Blind test: 0.62
Accuracy on Blind test: 0.81
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.73474646 0.72688985 0.72868824 0.73179102 0.73014355 0.73290229
0.73310161 0.72938943 0.7374897 0.73583198]
mean value: 0.7320974111557007
key: score_time
value: [0.00944114 0.00925851 0.00926185 0.00938869 0.00955248 0.00936866
0.00937533 0.00944853 0.0102365 0.00951147]
mean value: 0.009484314918518066
key: test_mcc
value: [0.85164138 0.92704716 0.96225045 0.96225045 0.9258201 0.96225045
0.88527041 0.92307692 0.89056356 0.96225045]
mean value: 0.9252421331587133
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.94230769 0.98076923]
mean value: 0.9617561683599419
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 0.96428571 0.98039216 0.98113208 0.96296296 0.98039216
0.94117647 0.96153846 0.94545455 0.98113208]
mean value: 0.9624392545424731
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.93103448 1. 0.96296296 0.92857143 1.
0.96 0.96153846 0.89655172 0.96296296]
mean value: 0.9496479165789511
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.92307692 0.96153846 1. 1. ]
mean value: 0.9769230769230769
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 0.96153846 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.94230769 0.98076923]
mean value: 0.9617521367521368
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 0.93103448 0.96153846 0.96296296 0.92857143 0.96153846
0.88888889 0.92592593 0.89655172 0.96296296]
mean value: 0.9282044264802886
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03148365 0.05373335 0.05133128 0.03273869 0.03149581 0.03127074
0.03117824 0.03164434 0.03118968 0.03135705]
mean value: 0.03574228286743164
key: score_time
value: [0.01274085 0.01517105 0.01338124 0.01331782 0.01315212 0.01508641
0.01483965 0.01540351 0.014956 0.01507926]
mean value: 0.01431279182434082
key: test_mcc
value: [0.48187381 0.35897436 0.6172134 0.73131034 0.4259217 0.39528471
0.35273781 0.50951017 0.5990423 0.38575837]
mean value: 0.48576269695469176
key: train_mcc
value: [0.86418083 0.95749365 0.97029183 0.95361464 0.92156343 0.80568158
0.86066297 0.97873227 0.79494933 0.926125 ]
mean value: 0.9033295534903398
key: test_accuracy
value: [0.73584906 0.67924528 0.80769231 0.86538462 0.71153846 0.69230769
0.67307692 0.75 0.78846154 0.69230769]
mean value: 0.7395863570391872
key: train_accuracy
value: [0.92750533 0.97867804 0.98510638 0.97659574 0.95957447 0.89361702
0.92553191 0.9893617 0.88723404 0.96170213]
mean value: 0.9484906773125255
key: test_fscore
value: [0.69565217 0.67924528 0.8 0.86792453 0.69387755 0.65217391
0.63829787 0.77192982 0.75555556 0.68 ]
mean value: 0.7234656701755069
key: train_fscore
value: [0.92201835 0.97844828 0.98501071 0.9769392 0.9580574 0.88095238
0.91954023 0.98938429 0.87290168 0.96017699]
mean value: 0.9443429499014124
key: test_precision
value: [0.8 0.69230769 0.83333333 0.85185185 0.73913043 0.75
0.71428571 0.70967742 0.89473684 0.70833333]
mean value: 0.7693656621354635
key: train_precision
value: [1. 0.98695652 0.99137931 0.96280992 0.99541284 1.
1. 0.98728814 1. 1. ]
mean value: 0.9923846729069248
key: test_recall
value: [0.61538462 0.66666667 0.76923077 0.88461538 0.65384615 0.57692308
0.57692308 0.84615385 0.65384615 0.65384615]
mean value: 0.6897435897435897
key: train_recall
value: [0.85531915 0.97008547 0.9787234 0.99148936 0.92340426 0.78723404
0.85106383 0.99148936 0.77446809 0.92340426]
mean value: 0.9046681214766321
key: test_roc_auc
value: [0.73361823 0.67948718 0.80769231 0.86538462 0.71153846 0.69230769
0.67307692 0.75 0.78846154 0.69230769]
mean value: 0.7393874643874644
key: train_roc_auc
value: [0.92765957 0.97865976 0.98510638 0.97659574 0.95957447 0.89361702
0.92553191 0.9893617 0.88723404 0.96170213]
mean value: 0.9485042735042735
key: test_jcc
value: [0.53333333 0.51428571 0.66666667 0.76666667 0.53125 0.48387097
0.46875 0.62857143 0.60714286 0.51515152]
mean value: 0.5715689149560117
key: train_jcc
value: [0.85531915 0.95780591 0.97046414 0.95491803 0.91949153 0.78723404
0.85106383 0.9789916 0.77446809 0.92340426]
mean value: 0.897316055874549
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02873778 0.03986144 0.03904796 0.03797555 0.03872371 0.03989911
0.04019141 0.05784249 0.0290029 0.0340817 ]
mean value: 0.03853640556335449
key: score_time
value: [0.02153182 0.02466536 0.01894593 0.01888561 0.02026582 0.01964355
0.01979423 0.03150344 0.03011656 0.02050829]
mean value: 0.0225860595703125
key: test_mcc
value: [0.85164138 0.70527596 0.77849894 0.92307692 0.89056356 0.74466871
0.73131034 0.88527041 0.81312325 0.77849894]
mean value: 0.8101928417182709
key: train_mcc
value: [0.86403192 0.87219919 0.86461295 0.86411148 0.86828166 0.85995606
0.88136192 0.87262489 0.86847048 0.86395495]
mean value: 0.8679605508568666
key: test_accuracy
value: [0.9245283 0.8490566 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.9023584905660378
key: train_accuracy
value: [0.93176972 0.93603412 0.93191489 0.93191489 0.93404255 0.92978723
0.94042553 0.93617021 0.93404255 0.93191489]
mean value: 0.9338016603910538
key: test_fscore
value: [0.92592593 0.86206897 0.875 0.96153846 0.94545455 0.85106383
0.86792453 0.94117647 0.90909091 0.89285714]
mean value: 0.9032100779061583
key: train_fscore
value: [0.93305439 0.93644068 0.93333333 0.93277311 0.93473684 0.93081761
0.94142259 0.93697479 0.93501048 0.93248945]
mean value: 0.9347053283732041
key: test_precision
value: [0.89285714 0.80645161 0.95454545 0.96153846 0.89655172 0.95238095
0.85185185 0.96 0.86206897 0.83333333]
mean value: 0.8971579499065595
key: train_precision
value: [0.91769547 0.92857143 0.91428571 0.92116183 0.925 0.91735537
0.92592593 0.9253112 0.9214876 0.92468619]
mean value: 0.9221480738754971
key: test_recall
value: [0.96153846 0.92592593 0.80769231 0.96153846 1. 0.76923077
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.9156695156695157
key: train_recall
value: [0.94893617 0.94444444 0.95319149 0.94468085 0.94468085 0.94468085
0.95744681 0.94893617 0.94893617 0.94042553]
mean value: 0.9476359338061466
key: test_roc_auc
value: [0.92521368 0.84757835 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.9022792022792023
key: train_roc_auc
value: [0.93173304 0.93605201 0.93191489 0.93191489 0.93404255 0.92978723
0.94042553 0.93617021 0.93404255 0.93191489]
mean value: 0.9337997817785052
key: test_jcc
value: [0.86206897 0.75757576 0.77777778 0.92592593 0.89655172 0.74074074
0.76666667 0.88888889 0.83333333 0.80645161]
mean value: 0.8255981393467489
key: train_jcc
value: [0.8745098 0.88047809 0.875 0.87401575 0.87747036 0.87058824
0.88932806 0.88142292 0.87795276 0.87351779]
mean value: 0.877428376123688
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.14161825 0.28556347 0.29658747 0.24531269 0.1590662 0.27728081
0.2977221 0.21515226 0.22903538 0.18153119]
mean value: 0.23288698196411134
key: score_time
value: [0.0169847 0.02052855 0.02147937 0.02026939 0.01914334 0.02140737
0.02591467 0.01721787 0.02034116 0.01228237]
mean value: 0.019556879997253418
key: test_mcc
value: [0.85164138 0.62867836 0.74466871 0.92307692 0.89056356 0.74466871
0.73131034 0.88527041 0.81312325 0.77849894]
mean value: 0.7991500590969782
key: train_mcc
value: [0.86403192 0.80817284 0.80058734 0.86411148 0.86828166 0.85995606
0.88136192 0.87262489 0.86847048 0.86395495]
mean value: 0.855155354590099
key: test_accuracy
value: [0.9245283 0.81132075 0.86538462 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.8966618287373005
key: train_accuracy
value: [0.93176972 0.90405117 0.9 0.93191489 0.93404255 0.92978723
0.94042553 0.93617021 0.93404255 0.93191489]
mean value: 0.9274118767862813
key: test_fscore
value: [0.92592593 0.82758621 0.85106383 0.96153846 0.94545455 0.85106383
0.86792453 0.94117647 0.90909091 0.89285714]
mean value: 0.8973681850228127
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93305439 0.9044586 0.90187891 0.93277311 0.93473684 0.93081761
0.94142259 0.93697479 0.93501048 0.93248945]
mean value: 0.9283616785563731
key: test_precision
value: [0.89285714 0.77419355 0.95238095 0.96153846 0.89655172 0.95238095
0.85185185 0.96 0.86206897 0.83333333]
mean value: 0.8937156932384963
key: train_precision
value: [0.91769547 0.89873418 0.8852459 0.92116183 0.925 0.91735537
0.92592593 0.9253112 0.9214876 0.92468619]
mean value: 0.9162603674752363
key: test_recall
value: [0.96153846 0.88888889 0.76923077 0.96153846 1. 0.76923077
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.9081196581196581
key: train_recall
value: [0.94893617 0.91025641 0.91914894 0.94468085 0.94468085 0.94468085
0.95744681 0.94893617 0.94893617 0.94042553]
mean value: 0.9408128750681942
key: test_roc_auc
value: [0.92521368 0.80982906 0.86538462 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.8965811965811966
key: train_roc_auc
value: [0.93173304 0.90406438 0.9 0.93191489 0.93404255 0.92978723
0.94042553 0.93617021 0.93404255 0.93191489]
mean value: 0.9274095290052737
key: test_jcc
value: [0.86206897 0.70588235 0.74074074 0.92592593 0.89655172 0.74074074
0.76666667 0.88888889 0.83333333 0.80645161]
mean value: 0.8167250951795871
key: train_jcc
value: [0.8745098 0.8255814 0.82129278 0.87401575 0.87747036 0.87058824
0.88932806 0.88142292 0.87795276 0.87351779]
mean value: 0.8665679844601714
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03287292 0.03884459 0.03730083 0.0362246 0.03666592 0.03770685
0.03795171 0.03666735 0.04778528 0.03587842]
mean value: 0.03778984546661377
key: score_time
value: [0.01219273 0.01410508 0.03120208 0.01247478 0.01239181 0.01472044
0.01600051 0.01480579 0.01244426 0.01247621]
mean value: 0.015281367301940917
key: test_mcc
value: [0.85164138 0.73997003 0.77849894 0.96225045 0.89056356 0.74466871
0.80829038 0.88527041 0.84866842 0.79056942]
mean value: 0.8300391695672018
key: train_mcc
value: [0.8593409 0.87640715 0.86411148 0.85113319 0.86815585 0.86395495
0.87246682 0.8597691 0.8597691 0.86386107]
mean value: 0.8638969612781384
key: test_accuracy
value: [0.9245283 0.86792453 0.88461538 0.98076923 0.94230769 0.86538462
0.90384615 0.94230769 0.92307692 0.88461538]
mean value: 0.9119375907111756
key: train_accuracy
value: [0.92963753 0.93816631 0.93191489 0.92553191 0.93404255 0.93191489
0.93617021 0.92978723 0.92978723 0.93191489]
mean value: 0.9318867667740326
key: test_fscore
value: [0.92592593 0.87719298 0.875 0.98113208 0.94545455 0.85106383
0.90566038 0.94117647 0.92592593 0.89655172]
mean value: 0.9125083857106127
key: train_fscore
value: [0.93023256 0.93842887 0.93277311 0.92600423 0.93446089 0.93248945
0.93670886 0.93052632 0.93052632 0.93162393]
mean value: 0.9323774533836076
key: test_precision
value: [0.89285714 0.83333333 0.95454545 0.96296296 0.89655172 0.95238095
0.88888889 0.96 0.89285714 0.8125 ]
mean value: 0.9046877601963809
key: train_precision
value: [0.92436975 0.93248945 0.92116183 0.92016807 0.92857143 0.92468619
0.92887029 0.92083333 0.92083333 0.93562232]
mean value: 0.9257605990519295
key: test_recall
value: [0.96153846 0.92592593 0.80769231 1. 1. 0.76923077
0.92307692 0.92307692 0.96153846 1. ]
mean value: 0.9272079772079772
key: train_recall
value: [0.93617021 0.94444444 0.94468085 0.93191489 0.94042553 0.94042553
0.94468085 0.94042553 0.94042553 0.92765957]
mean value: 0.9391252955082743
key: test_roc_auc
value: [0.92521368 0.86680912 0.88461538 0.98076923 0.94230769 0.86538462
0.90384615 0.94230769 0.92307692 0.88461538]
mean value: 0.9118945868945869
key: train_roc_auc
value: [0.92962357 0.93817967 0.93191489 0.92553191 0.93404255 0.93191489
0.93617021 0.92978723 0.92978723 0.93191489]
mean value: 0.9318867066739408
key: test_jcc
value: [0.86206897 0.78125 0.77777778 0.96296296 0.89655172 0.74074074
0.82758621 0.88888889 0.86206897 0.8125 ]
mean value: 0.8412396232439335
key: train_jcc
value: [0.86956522 0.884 0.87401575 0.86220472 0.87698413 0.87351779
0.88095238 0.87007874 0.87007874 0.872 ]
mean value: 0.8733397464644983
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.99820614 0.89078689 1.11161804 0.91593313 0.99522948 1.27120042
1.16889167 0.97342539 0.93785477 0.97674584]
mean value: 1.0239891767501832
key: score_time
value: [0.01472306 0.01496816 0.01541257 0.01507759 0.01992369 0.01551318
0.01557875 0.0176332 0.01491976 0.01504946]
mean value: 0.0158799409866333
key: test_mcc
value: [0.81196581 0.8116984 0.77849894 0.96225045 0.89056356 0.77849894
0.84615385 0.84866842 0.84866842 0.82305489]
mean value: 0.8400021693785612
key: train_mcc
value: [0.91045482 0.91484796 0.90667855 0.89790486 0.91064654 0.90233192
0.90252815 0.90233192 0.88965172 0.90220118]
mean value: 0.9039577621659811
key: test_accuracy
value: [0.90566038 0.90566038 0.88461538 0.98076923 0.94230769 0.88461538
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.9176705370101597
key: train_accuracy
value: [0.95522388 0.95735608 0.95319149 0.94893617 0.95531915 0.95106383
0.95106383 0.95106383 0.94468085 0.95106383]
mean value: 0.9518962936079481
key: test_fscore
value: [0.90566038 0.90909091 0.875 0.98113208 0.94545455 0.875
0.92307692 0.92 0.92592593 0.9122807 ]
mean value: 0.9172621458132878
key: train_fscore
value: [0.95541401 0.95762712 0.95378151 0.94915254 0.95541401 0.95157895
0.95178197 0.95157895 0.94537815 0.95137421]
mean value: 0.95230814229351
key: test_precision
value: [0.88888889 0.89285714 0.95454545 0.96296296 0.89655172 0.95454545
0.92307692 0.95833333 0.89285714 0.83870968]
mean value: 0.9163328704624589
key: train_precision
value: [0.95338983 0.94957983 0.94190871 0.94514768 0.95338983 0.94166667
0.93801653 0.94166667 0.93360996 0.94537815]
mean value: 0.9443753857993245
key: test_recall
value: [0.92307692 0.92592593 0.80769231 1. 1. 0.80769231
0.92307692 0.88461538 0.96153846 1. ]
mean value: 0.9233618233618234
key: train_recall
value: [0.95744681 0.96581197 0.96595745 0.95319149 0.95744681 0.96170213
0.96595745 0.96170213 0.95744681 0.95744681]
mean value: 0.9604109838152391
key: test_roc_auc
value: [0.90598291 0.90527066 0.88461538 0.98076923 0.94230769 0.88461538
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.9176638176638177
key: train_roc_auc
value: [0.95521913 0.95737407 0.95319149 0.94893617 0.95531915 0.95106383
0.95106383 0.95106383 0.94468085 0.95106383]
mean value: 0.9518976177486816
key: test_jcc
value: [0.82758621 0.83333333 0.77777778 0.96296296 0.89655172 0.77777778
0.85714286 0.85185185 0.86206897 0.83870968]
mean value: 0.848576313481764
key: train_jcc
value: [0.91463415 0.91869919 0.91164659 0.90322581 0.91463415 0.90763052
0.908 0.90763052 0.89641434 0.90725806]
mean value: 0.9089773323794109
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01577473 0.01172471 0.01135039 0.01113343 0.01135993 0.01071382
0.01026487 0.01151204 0.01141429 0.01121664]
mean value: 0.011646485328674317
key: score_time
value: [0.01278234 0.01041102 0.00982523 0.00985003 0.01006365 0.00986242
0.00955105 0.01019549 0.00995421 0.01008344]
mean value: 0.010257887840270995
key: test_mcc
value: [0.66048569 0.40912228 0.74466871 0.77151675 0.80829038 0.62279916
0.57735027 0.54006172 0.54006172 0.73568294]
mean value: 0.6410039620540624
key: train_mcc
value: [0.66639366 0.7005426 0.701239 0.68145013 0.69523029 0.7097907
0.66017245 0.67359644 0.68812845 0.69162595]
mean value: 0.6868169671603843
key: test_accuracy
value: [0.83018868 0.69811321 0.86538462 0.88461538 0.90384615 0.80769231
0.78846154 0.76923077 0.76923077 0.86538462]
mean value: 0.8182148040638607
key: train_accuracy
value: [0.8315565 0.84861407 0.84893617 0.83829787 0.84680851 0.85319149
0.82978723 0.83404255 0.84255319 0.84468085]
mean value: 0.8418468448033389
key: test_fscore
value: [0.82352941 0.66666667 0.85106383 0.88 0.90196078 0.79166667
0.78431373 0.77777778 0.76 0.85714286]
mean value: 0.809412171960983
key: train_fscore
value: [0.82326622 0.84044944 0.84116331 0.8280543 0.84140969 0.84563758
0.83333333 0.82272727 0.83482143 0.83813747]
mean value: 0.8349000049484545
key: test_precision
value: [0.84 0.76190476 0.95238095 0.91666667 0.92 0.86363636
0.8 0.75 0.79166667 0.91304348]
mean value: 0.850929888951628
key: train_precision
value: [0.86792453 0.88625592 0.88679245 0.88405797 0.87214612 0.89150943
0.81632653 0.88292683 0.87793427 0.875 ]
mean value: 0.8740874061181917
key: test_recall
value: [0.80769231 0.59259259 0.76923077 0.84615385 0.88461538 0.73076923
0.76923077 0.80769231 0.73076923 0.80769231]
mean value: 0.7746438746438746
key: train_recall
value: [0.78297872 0.7991453 0.8 0.7787234 0.81276596 0.80425532
0.85106383 0.77021277 0.79574468 0.80425532]
mean value: 0.7999145299145299
key: test_roc_auc
value: [0.82977208 0.70014245 0.86538462 0.88461538 0.90384615 0.80769231
0.78846154 0.76923077 0.76923077 0.86538462]
mean value: 0.8183760683760685
key: train_roc_auc
value: [0.8316603 0.84850882 0.84893617 0.83829787 0.84680851 0.85319149
0.82978723 0.83404255 0.84255319 0.84468085]
mean value: 0.8418466993998909
key: test_jcc
value: [0.7 0.5 0.74074074 0.78571429 0.82142857 0.65517241
0.64516129 0.63636364 0.61290323 0.75 ]
mean value: 0.684748416416937
key: train_jcc
value: [0.69961977 0.7248062 0.72586873 0.70656371 0.72623574 0.73255814
0.71428571 0.6988417 0.7164751 0.72137405]
mean value: 0.7166628841540069
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01205468 0.0115869 0.01135087 0.01066089 0.01150608 0.01090503
0.01032591 0.01174426 0.01039815 0.01049948]
mean value: 0.011103224754333497
key: score_time
value: [0.01041389 0.00969172 0.0097506 0.00891495 0.00965595 0.00975871
0.00914407 0.00987816 0.00919986 0.00920725]
mean value: 0.009561514854431153
key: test_mcc
value: [0.73646724 0.47360961 0.65433031 0.88527041 0.69436507 0.69230769
0.65824263 0.77151675 0.69436507 0.65433031]
mean value: 0.6914805091639882
key: train_mcc
value: [0.73140924 0.75708961 0.72356805 0.74043224 0.76629748 0.77032436
0.70276422 0.67337154 0.73659716 0.75330062]
mean value: 0.7355154534366981
key: test_accuracy
value: [0.86792453 0.73584906 0.82692308 0.94230769 0.84615385 0.84615385
0.82692308 0.88461538 0.84615385 0.82692308]
mean value: 0.8449927431059506
key: train_accuracy
value: [0.86567164 0.87846482 0.86170213 0.87021277 0.88297872 0.88510638
0.85106383 0.83617021 0.86808511 0.87659574]
mean value: 0.8676051354171392
key: test_fscore
value: [0.86792453 0.73076923 0.82352941 0.94117647 0.85185185 0.84615385
0.81632653 0.88 0.85185185 0.82352941]
mean value: 0.843311313365856
key: train_fscore
value: [0.86509636 0.87688985 0.86021505 0.87048832 0.88469602 0.88607595
0.84782609 0.83150985 0.86580087 0.87553648]
mean value: 0.8664134831445992
key: test_precision
value: [0.85185185 0.76 0.84 0.96 0.82142857 0.84615385
0.86956522 0.91666667 0.82142857 0.84 ]
mean value: 0.8527094724920812
key: train_precision
value: [0.87068966 0.88646288 0.86956522 0.86864407 0.87190083 0.87866109
0.86666667 0.85585586 0.88105727 0.88311688]
mean value: 0.873262041113066
key: test_recall
value: [0.88461538 0.7037037 0.80769231 0.92307692 0.88461538 0.84615385
0.76923077 0.84615385 0.88461538 0.80769231]
mean value: 0.8357549857549857
key: train_recall
value: [0.85957447 0.86752137 0.85106383 0.87234043 0.89787234 0.89361702
0.82978723 0.80851064 0.85106383 0.86808511]
mean value: 0.8599436261138389
key: test_roc_auc
value: [0.86823362 0.73646724 0.82692308 0.94230769 0.84615385 0.84615385
0.82692308 0.88461538 0.84615385 0.82692308]
mean value: 0.8450854700854701
key: train_roc_auc
value: [0.86568467 0.87844153 0.86170213 0.87021277 0.88297872 0.88510638
0.85106383 0.83617021 0.86808511 0.87659574]
mean value: 0.8676041098381524
key: test_jcc
value: [0.76666667 0.57575758 0.7 0.88888889 0.74193548 0.73333333
0.68965517 0.78571429 0.74193548 0.7 ]
mean value: 0.7323886890516479
key: train_jcc
value: [0.76226415 0.78076923 0.75471698 0.77067669 0.79323308 0.79545455
0.73584906 0.71161049 0.76335878 0.77862595]
mean value: 0.7646558959054925
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00998068 0.01181459 0.01135254 0.01113796 0.01147676 0.01108146
0.01116586 0.01172829 0.01143646 0.01152492]
mean value: 0.011269950866699218
key: score_time
value: [0.0131557 0.01342392 0.01350808 0.01317334 0.01755738 0.01329255
0.0138526 0.01361537 0.01634336 0.01371002]
mean value: 0.01416323184967041
key: test_mcc
value: [0.54793065 0.3223969 0.34641016 0.56591646 0.62279916 0.4233902
0.43929769 0.73568294 0.65824263 0.27104108]
mean value: 0.49331078658150723
key: train_mcc
value: [0.7231531 0.71458471 0.73192152 0.69364214 0.71917498 0.71066404
0.71495188 0.68936794 0.72008837 0.73208062]
mean value: 0.7149629304142088
key: test_accuracy
value: [0.77358491 0.66037736 0.67307692 0.76923077 0.80769231 0.71153846
0.71153846 0.86538462 0.82692308 0.63461538]
mean value: 0.7433962264150944
key: train_accuracy
value: [0.86140725 0.85714286 0.86595745 0.84680851 0.85957447 0.85531915
0.85744681 0.84468085 0.85957447 0.86595745]
mean value: 0.8573869255545978
key: test_fscore
value: [0.76 0.65384615 0.67924528 0.72727273 0.82142857 0.71698113
0.66666667 0.87272727 0.81632653 0.6122449 ]
mean value: 0.732673923560716
key: train_fscore
value: [0.85961123 0.85466377 0.86567164 0.84615385 0.85897436 0.85470085
0.85653105 0.84434968 0.8558952 0.86451613]
mean value: 0.8561067762085006
key: test_precision
value: [0.79166667 0.68 0.66666667 0.88888889 0.76666667 0.7037037
0.78947368 0.82758621 0.86956522 0.65217391]
mean value: 0.7636391614134453
key: train_precision
value: [0.87280702 0.86784141 0.86752137 0.84978541 0.86266094 0.8583691
0.86206897 0.84615385 0.87892377 0.87391304]
mean value: 0.8640044867366126
key: test_recall
value: [0.73076923 0.62962963 0.69230769 0.61538462 0.88461538 0.73076923
0.57692308 0.92307692 0.76923077 0.57692308]
mean value: 0.7129629629629629
key: train_recall
value: [0.84680851 0.84188034 0.86382979 0.84255319 0.85531915 0.85106383
0.85106383 0.84255319 0.83404255 0.85531915]
mean value: 0.8484433533369704
key: test_roc_auc
value: [0.77279202 0.66096866 0.67307692 0.76923077 0.80769231 0.71153846
0.71153846 0.86538462 0.82692308 0.63461538]
mean value: 0.7433760683760684
key: train_roc_auc
value: [0.86143844 0.85711038 0.86595745 0.84680851 0.85957447 0.85531915
0.85744681 0.84468085 0.85957447 0.86595745]
mean value: 0.8573867975995636
key: test_jcc
value: [0.61290323 0.48571429 0.51428571 0.57142857 0.6969697 0.55882353
0.5 0.77419355 0.68965517 0.44117647]
mean value: 0.584515021500561
key: train_jcc
value: [0.75378788 0.74621212 0.76315789 0.73333333 0.75280899 0.74626866
0.74906367 0.73062731 0.7480916 0.76136364]
mean value: 0.7484715089652758
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02623487 0.02498984 0.02270436 0.02161646 0.02071953 0.02152848
0.02124238 0.02357602 0.02255344 0.02224708]
mean value: 0.022741246223449706
key: score_time
value: [0.0136168 0.01234031 0.01236081 0.01204133 0.01282573 0.01181722
0.01200223 0.01174998 0.01184368 0.01281047]
mean value: 0.012340855598449708
key: test_mcc
value: [0.81196581 0.69957726 0.77849894 0.92307692 0.84866842 0.77849894
0.73131034 0.88527041 0.81312325 0.73568294]
mean value: 0.800567324577806
key: train_mcc
value: [0.7995781 0.81236588 0.80451759 0.78726255 0.79574468 0.80451759
0.80428445 0.79155386 0.80000724 0.80851796]
mean value: 0.8008349901879068
key: test_accuracy
value: [0.90566038 0.8490566 0.88461538 0.96153846 0.92307692 0.88461538
0.86538462 0.94230769 0.90384615 0.86538462]
mean value: 0.8985486211901307
key: train_accuracy
value: [0.89978678 0.90618337 0.90212766 0.89361702 0.89787234 0.90212766
0.90212766 0.89574468 0.9 0.90425532]
mean value: 0.9003842489679263
key: test_fscore
value: [0.90566038 0.85714286 0.875 0.96153846 0.92592593 0.875
0.86792453 0.94117647 0.90909091 0.87272727]
mean value: 0.8991186802674039
key: train_fscore
value: [0.90021231 0.90598291 0.90336134 0.8940678 0.89787234 0.90336134
0.90254237 0.89640592 0.89978678 0.90405117]
mean value: 0.9007644291954064
key: test_precision
value: [0.88888889 0.82758621 0.95454545 0.96153846 0.89285714 0.95454545
0.85185185 0.96 0.86206897 0.82758621]
mean value: 0.8981468633537599
key: train_precision
value: [0.89830508 0.90598291 0.89211618 0.89029536 0.89787234 0.89211618
0.89873418 0.8907563 0.9017094 0.90598291]
mean value: 0.8973870842377724
key: test_recall
value: [0.92307692 0.88888889 0.80769231 0.96153846 0.96153846 0.80769231
0.88461538 0.92307692 0.96153846 0.92307692]
mean value: 0.9042735042735043
key: train_recall
value: [0.90212766 0.90598291 0.91489362 0.89787234 0.89787234 0.91489362
0.90638298 0.90212766 0.89787234 0.90212766]
mean value: 0.9042153118748864
key: test_roc_auc
value: [0.90598291 0.8482906 0.88461538 0.96153846 0.92307692 0.88461538
0.86538462 0.94230769 0.90384615 0.86538462]
mean value: 0.8985042735042735
key: train_roc_auc
value: [0.89978178 0.90618294 0.90212766 0.89361702 0.89787234 0.90212766
0.90212766 0.89574468 0.9 0.90425532]
mean value: 0.9003837061283869
key: test_jcc
value: [0.82758621 0.75 0.77777778 0.92592593 0.86206897 0.77777778
0.76666667 0.88888889 0.83333333 0.77419355]
mean value: 0.818421909117126
key: train_jcc
value: [0.81853282 0.828125 0.82375479 0.80842912 0.81467181 0.82375479
0.82239382 0.81226054 0.81782946 0.82490272]
mean value: 0.819465487041468
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.69608831 1.78880715 2.0408814 2.11449838 1.90394425 1.97782683
1.97935748 2.07013917 2.04062533 2.04656172]
mean value: 1.9658730030059814
key: score_time
value: [0.01268101 0.01269102 0.01288629 0.01712489 0.01619506 0.01270795
0.01263285 0.01373196 0.01519346 0.01510119]
mean value: 0.01409456729888916
key: test_mcc
value: [0.77350427 0.8116984 0.74466871 0.96225045 0.85634884 0.73568294
0.84866842 0.84866842 0.84615385 0.82305489]
mean value: 0.825069919863237
key: train_mcc
value: [0.97037106 0.98721563 0.99148936 0.9873145 1. 0.9957537
0.9957537 1. 0.98312115 0.9957537 ]
mean value: 0.9906772783362988
key: test_accuracy
value: [0.88679245 0.90566038 0.86538462 0.98076923 0.92307692 0.86538462
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.9100145137880987
key: train_accuracy
value: [0.98507463 0.99360341 0.99574468 0.99361702 1. 0.99787234
0.99787234 1. 0.99148936 0.99787234]
mean value: 0.9953146123485914
key: test_fscore
value: [0.88461538 0.90909091 0.85106383 0.98113208 0.92857143 0.85714286
0.92 0.92 0.92307692 0.9122807 ]
mean value: 0.908697410951082
key: train_fscore
value: [0.98494624 0.99357602 0.99574468 0.99357602 1. 0.9978678
0.9978678 1. 0.99141631 0.99787686]
mean value: 0.9952871726109697
key: test_precision
value: [0.88461538 0.89285714 0.95238095 0.96296296 0.86666667 0.91304348
0.95833333 0.95833333 0.92307692 0.83870968]
mean value: 0.9150979854906923
key: train_precision
value: [0.99565217 0.99570815 0.99574468 1. 1. 1.
1. 1. 1. 0.99576271]
mean value: 0.9982867721134951
key: test_recall
value: [0.88461538 0.92592593 0.76923077 1. 1. 0.80769231
0.88461538 0.88461538 0.92307692 1. ]
mean value: 0.9079772079772079
key: train_recall
value: [0.97446809 0.99145299 0.99574468 0.98723404 1. 0.99574468
0.99574468 1. 0.98297872 1. ]
mean value: 0.9923367885070012
key: test_roc_auc
value: [0.88675214 0.90527066 0.86538462 0.98076923 0.92307692 0.86538462
0.92307692 0.92307692 0.92307692 0.90384615]
mean value: 0.90997150997151
key: train_roc_auc
value: [0.98509729 0.99359884 0.99574468 0.99361702 1. 0.99787234
0.99787234 1. 0.99148936 0.99787234]
mean value: 0.995316421167485
key: test_jcc
value: [0.79310345 0.83333333 0.74074074 0.96296296 0.86666667 0.75
0.85185185 0.85185185 0.85714286 0.83870968]
mean value: 0.8346363390245481
key: train_jcc
value: [0.97033898 0.98723404 0.99152542 0.98723404 1. 0.99574468
0.99574468 1. 0.98297872 0.99576271]
mean value: 0.9906563288856833
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02953148 0.02271771 0.02113962 0.02237082 0.01985717 0.02250838
0.02102804 0.02116036 0.02456594 0.02486062]
mean value: 0.022974014282226562
key: score_time
value: [0.01245403 0.00974846 0.00945234 0.00895834 0.00919867 0.00918245
0.00941133 0.00912023 0.00937629 0.01030064]
mean value: 0.009720277786254884
key: test_mcc
value: [0.81688878 0.92704716 0.92307692 0.88527041 0.84866842 0.96225045
0.84615385 0.84866842 0.77151675 1. ]
mean value: 0.8829541177251579
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90566038 0.96226415 0.96153846 0.94230769 0.92307692 0.98076923
0.92307692 0.92307692 0.88461538 1. ]
mean value: 0.9406386066763426
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.96428571 0.96153846 0.94339623 0.92592593 0.98039216
0.92307692 0.92592593 0.88888889 1. ]
mean value: 0.9422521132010588
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86206897 0.93103448 0.96153846 0.92592593 0.89285714 1.
0.92307692 0.89285714 0.85714286 1. ]
mean value: 0.9246501901674316
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 0.96153846 0.96153846 0.96153846
0.92307692 0.96153846 0.92307692 1. ]
mean value: 0.9615384615384616
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90669516 0.96153846 0.96153846 0.94230769 0.92307692 0.98076923
0.92307692 0.92307692 0.88461538 1. ]
mean value: 0.9406695156695157
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.93103448 0.92592593 0.89285714 0.86206897 0.96153846
0.85714286 0.86206897 0.8 1. ]
mean value: 0.8925970134590824
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12382007 0.12043929 0.12239695 0.12167001 0.12279487 0.1219244
0.12101841 0.12090349 0.12148833 0.12095571]
mean value: 0.12174115180969239
key: score_time
value: [0.01793957 0.01816487 0.01760411 0.01803041 0.0179038 0.01825953
0.01798344 0.01780105 0.01787972 0.01786542]
mean value: 0.017943191528320312
key: test_mcc
value: [0.77603503 0.66096866 0.77151675 0.88527041 0.88527041 0.81312325
0.76923077 0.92307692 0.89056356 0.71151247]
mean value: 0.8086568232416546
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88679245 0.83018868 0.88461538 0.94230769 0.94230769 0.90384615
0.88461538 0.96153846 0.94230769 0.84615385]
mean value: 0.902467343976778
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.83018868 0.88 0.94117647 0.94339623 0.89795918
0.88461538 0.96153846 0.94545455 0.86206897]
mean value: 0.9035286805936604
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.84615385 0.91666667 0.96 0.92592593 0.95652174
0.88461538 0.96153846 0.89655172 0.78125 ]
mean value: 0.8986366605311508
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.81481481 0.84615385 0.92307692 0.96153846 0.84615385
0.88461538 0.96153846 1. 0.96153846]
mean value: 0.9122507122507123
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88746439 0.83048433 0.88461538 0.94230769 0.94230769 0.90384615
0.88461538 0.96153846 0.94230769 0.84615385]
mean value: 0.9025641025641026
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.70967742 0.78571429 0.88888889 0.89285714 0.81481481
0.79310345 0.92592593 0.89655172 0.75757576]
mean value: 0.8265109407545448
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0100944 0.01009727 0.01013374 0.01006222 0.01016641 0.01010799
0.01006365 0.01008892 0.01019859 0.01012897]
mean value: 0.010114216804504394
key: score_time
value: [0.00880814 0.00876117 0.00883174 0.0087719 0.00877357 0.00879431
0.00871038 0.00887156 0.00874829 0.0087359 ]
mean value: 0.008780694007873536
key: test_mcc
value: [ 0.43447293 0.43366663 0.4233902 0.50336201 0.6172134 0.46291005
0.43929769 0.73568294 0.54006172 -0.08084521]
mean value: 0.4509212363913005
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71698113 0.71698113 0.71153846 0.75 0.80769231 0.73076923
0.71153846 0.86538462 0.76923077 0.46153846]
mean value: 0.7241654571843251
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71698113 0.72727273 0.70588235 0.76363636 0.81481481 0.74074074
0.74576271 0.87272727 0.77777778 0.53333333]
mean value: 0.7398929227184086
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7037037 0.71428571 0.72 0.72413793 0.78571429 0.71428571
0.66666667 0.82758621 0.75 0.47058824]
mean value: 0.7076968457881236
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.73076923 0.74074074 0.69230769 0.80769231 0.84615385 0.76923077
0.84615385 0.92307692 0.80769231 0.61538462]
mean value: 0.777920227920228
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71723647 0.71652422 0.71153846 0.75 0.80769231 0.73076923
0.71153846 0.86538462 0.76923077 0.46153846]
mean value: 0.7241452991452991
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55882353 0.57142857 0.54545455 0.61764706 0.6875 0.58823529
0.59459459 0.77419355 0.63636364 0.36363636]
mean value: 0.5937877142217749
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.76943898 1.83275151 1.84324455 1.89071012 1.83631063 1.77790737
1.78165555 1.7913177 1.78348899 1.77847815]
mean value: 1.8085303544998168
key: score_time
value: [0.09400439 0.09472203 0.10127854 0.10282397 0.09269404 0.09585142
0.09383512 0.09480047 0.15068531 0.09217691]
mean value: 0.10128722190856934
key: test_mcc
value: [0.81688878 0.92450142 0.9258201 0.92307692 0.9258201 0.9258201
0.9258201 0.96225045 0.9258201 0.9258201 ]
mean value: 0.9181638177359281
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90566038 0.96226415 0.96153846 0.96153846 0.96153846 0.96153846
0.96153846 0.98076923 0.96153846 0.96153846]
mean value: 0.9579462989840348
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.96296296 0.96 0.96153846 0.96296296 0.96
0.96 0.98039216 0.96296296 0.96296296]
mean value: 0.9582873379343968
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86206897 0.96296296 1. 0.96153846 0.92857143 1.
1. 1. 0.92857143 0.92857143]
mean value: 0.9572284675732952
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 0.96296296 0.92307692 0.96153846 1. 0.92307692
0.92307692 0.96153846 1. 1. ]
mean value: 0.9616809116809117
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90669516 0.96225071 0.96153846 0.96153846 0.96153846 0.96153846
0.96153846 0.98076923 0.96153846 0.96153846]
mean value: 0.9580484330484331
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.92857143 0.92307692 0.92592593 0.92857143 0.92307692
0.92307692 0.96153846 0.92857143 0.92857143]
mean value: 0.9204314204314205
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.92528653 0.9376781 0.96513009 1.02676868 1.00038338 1.00743413
0.93951488 1.03116369 0.93464684 0.97854924]
mean value: 0.974655556678772
key: score_time
value: [0.26444244 0.20015526 0.24083567 0.22042966 0.2193284 0.27657557
0.22205067 0.23441887 0.12471294 0.23463321]
mean value: 0.22375826835632323
key: test_mcc
value: [0.81688878 0.77350427 0.9258201 0.92307692 0.9258201 0.88527041
0.9258201 0.96225045 0.9258201 0.9258201 ]
mean value: 0.8990091339347005
key: train_mcc
value: [0.96162939 0.95309971 0.95744681 0.95320012 0.95748148 0.95320012
0.95744681 0.95320012 0.94893617 0.95748148]
mean value: 0.9553122213642232
key: test_accuracy
value: [0.90566038 0.88679245 0.96153846 0.96153846 0.96153846 0.94230769
0.96153846 0.98076923 0.96153846 0.96153846]
mean value: 0.9484760522496372
key: train_accuracy
value: [0.98081023 0.97654584 0.9787234 0.97659574 0.9787234 0.97659574
0.9787234 0.97659574 0.97446809 0.9787234 ]
mean value: 0.9776505012929274
key: test_fscore
value: [0.90909091 0.88888889 0.96 0.96153846 0.96296296 0.94117647
0.96 0.98039216 0.96296296 0.96296296]
mean value: 0.9489975775858129
key: train_fscore
value: [0.98081023 0.9764454 0.9787234 0.97654584 0.97863248 0.97654584
0.9787234 0.97654584 0.97446809 0.97863248]
mean value: 0.9776073008221619
key: test_precision
value: [0.86206897 0.88888889 1. 0.96153846 0.92857143 0.96
1. 1. 0.92857143 0.92857143]
mean value: 0.9458210601658877
key: train_precision
value: [0.98290598 0.97854077 0.9787234 0.97863248 0.98283262 0.97863248
0.9787234 0.97863248 0.97446809 0.98283262]
mean value: 0.979492432100413
key: test_recall
value: [0.96153846 0.88888889 0.92307692 0.96153846 1. 0.92307692
0.92307692 0.96153846 1. 1. ]
mean value: 0.9542735042735043
key: train_recall
value: [0.9787234 0.97435897 0.9787234 0.97446809 0.97446809 0.97446809
0.9787234 0.97446809 0.97446809 0.97446809]
mean value: 0.975733769776323
key: test_roc_auc
value: [0.90669516 0.88675214 0.96153846 0.96153846 0.96153846 0.94230769
0.96153846 0.98076923 0.96153846 0.96153846]
mean value: 0.9485754985754986
key: train_roc_auc
value: [0.98081469 0.97654119 0.9787234 0.97659574 0.9787234 0.97659574
0.9787234 0.97659574 0.97446809 0.9787234 ]
mean value: 0.977650481905801
key: test_jcc
value: [0.83333333 0.8 0.92307692 0.92592593 0.92857143 0.88888889
0.92307692 0.96153846 0.92857143 0.92857143]
mean value: 0.9041554741554741
key: train_jcc
value: [0.9623431 0.9539749 0.95833333 0.95416667 0.958159 0.95416667
0.95833333 0.95416667 0.95020747 0.958159 ]
mean value: 0.9562010118809934
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01119733 0.01217985 0.01093841 0.01117277 0.01136184 0.0108006
0.01067758 0.01132941 0.01079106 0.01140022]
mean value: 0.011184906959533692
key: score_time
value: [0.00896358 0.00987315 0.00957155 0.00922394 0.00965858 0.00929475
0.0097177 0.00967407 0.00959873 0.00960207]
mean value: 0.009517812728881836
key: test_mcc
value: [0.73646724 0.47360961 0.65433031 0.88527041 0.69436507 0.69230769
0.65824263 0.77151675 0.69436507 0.65433031]
mean value: 0.6914805091639882
key: train_mcc
value: [0.73140924 0.75708961 0.72356805 0.74043224 0.76629748 0.77032436
0.70276422 0.67337154 0.73659716 0.75330062]
mean value: 0.7355154534366981
key: test_accuracy
value: [0.86792453 0.73584906 0.82692308 0.94230769 0.84615385 0.84615385
0.82692308 0.88461538 0.84615385 0.82692308]
mean value: 0.8449927431059506
key: train_accuracy
value: [0.86567164 0.87846482 0.86170213 0.87021277 0.88297872 0.88510638
0.85106383 0.83617021 0.86808511 0.87659574]
mean value: 0.8676051354171392
key: test_fscore
value: [0.86792453 0.73076923 0.82352941 0.94117647 0.85185185 0.84615385
0.81632653 0.88 0.85185185 0.82352941]
mean value: 0.843311313365856
key: train_fscore
value: [0.86509636 0.87688985 0.86021505 0.87048832 0.88469602 0.88607595
0.84782609 0.83150985 0.86580087 0.87553648]
mean value: 0.8664134831445992
key: test_precision
value: [0.85185185 0.76 0.84 0.96 0.82142857 0.84615385
0.86956522 0.91666667 0.82142857 0.84 ]
mean value: 0.8527094724920812
key: train_precision
value: [0.87068966 0.88646288 0.86956522 0.86864407 0.87190083 0.87866109
0.86666667 0.85585586 0.88105727 0.88311688]
mean value: 0.873262041113066
key: test_recall
value: [0.88461538 0.7037037 0.80769231 0.92307692 0.88461538 0.84615385
0.76923077 0.84615385 0.88461538 0.80769231]
mean value: 0.8357549857549857
key: train_recall
value: [0.85957447 0.86752137 0.85106383 0.87234043 0.89787234 0.89361702
0.82978723 0.80851064 0.85106383 0.86808511]
mean value: 0.8599436261138389
key: test_roc_auc
value: [0.86823362 0.73646724 0.82692308 0.94230769 0.84615385 0.84615385
0.82692308 0.88461538 0.84615385 0.82692308]
mean value: 0.8450854700854701
key: train_roc_auc
value: [0.86568467 0.87844153 0.86170213 0.87021277 0.88297872 0.88510638
0.85106383 0.83617021 0.86808511 0.87659574]
mean value: 0.8676041098381524
key: test_jcc
value: [0.76666667 0.57575758 0.7 0.88888889 0.74193548 0.73333333
0.68965517 0.78571429 0.74193548 0.7 ]
mean value: 0.7323886890516479
key: train_jcc
value: [0.76226415 0.78076923 0.75471698 0.77067669 0.79323308 0.79545455
0.73584906 0.71161049 0.76335878 0.77862595]
mean value: 0.7646558959054925
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08577728 0.07616615 0.09151053 0.07672358 0.0770278 0.08412862
0.06949353 0.0763917 0.08024454 0.08369136]
mean value: 0.08011550903320312
key: score_time
value: [0.01101613 0.01094556 0.0117805 0.01137733 0.0115149 0.01199055
0.01296544 0.01189756 0.01115489 0.0111413 ]
mean value: 0.01157841682434082
key: test_mcc
value: [0.88746439 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045
0.88527041 0.9258201 0.9258201 0.96225045]
mean value: 0.9362108002286528
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94339623 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.96153846 0.98076923]
mean value: 0.9674528301886792
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94339623 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216
0.94117647 0.96 0.96296296 0.98113208]
mean value: 0.9675365269416324
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92592593 0.96428571 1. 0.96296296 0.92857143 1.
0.96 1. 0.92857143 0.96296296]
mean value: 0.9633280423280424
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.92307692 0.92307692 1. 1. ]
mean value: 0.9730769230769231
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94373219 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.96153846 0.98076923]
mean value: 0.9674501424501425
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89285714 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846
0.88888889 0.92307692 0.92857143 0.96296296]
mean value: 0.9375254375254375
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04911208 0.07845759 0.0794065 0.07849884 0.07238936 0.04618835
0.09146833 0.09963346 0.07126188 0.0715549 ]
mean value: 0.07379713058471679
key: score_time
value: [0.01892233 0.0187223 0.01950288 0.0187819 0.01244593 0.01247454
0.01240921 0.01876998 0.02508521 0.01220512]
mean value: 0.016931939125061034
key: test_mcc
value: [0.77603503 0.73997003 0.77849894 0.88527041 0.84866842 0.81312325
0.6172134 0.80829038 0.74466871 0.71151247]
mean value: 0.7723251042423636
key: train_mcc
value: [0.89794254 0.89379475 0.91104256 0.90651431 0.91922384 0.91084449
0.91084449 0.91502618 0.91519196 0.90641581]
mean value: 0.9086840926765474
key: test_accuracy
value: [0.88679245 0.86792453 0.88461538 0.94230769 0.92307692 0.90384615
0.80769231 0.90384615 0.86538462 0.84615385]
mean value: 0.8831640058055152
key: train_accuracy
value: [0.94882729 0.9466951 0.95531915 0.95319149 0.95957447 0.95531915
0.95531915 0.95744681 0.95744681 0.95319149]
mean value: 0.9542330898697999
key: test_fscore
value: [0.88888889 0.87719298 0.875 0.94339623 0.92592593 0.89795918
0.81481481 0.90196078 0.87719298 0.86206897]
mean value: 0.8864400754461441
key: train_fscore
value: [0.94957983 0.94736842 0.95597484 0.9535865 0.95983087 0.95578947
0.95578947 0.95780591 0.95798319 0.95338983]
mean value: 0.9547098338777809
key: test_precision
value: [0.85714286 0.83333333 0.95454545 0.92592593 0.89285714 0.95652174
0.78571429 0.92 0.80645161 0.78125 ]
mean value: 0.871374235155266
key: train_precision
value: [0.93775934 0.93360996 0.94214876 0.94560669 0.95378151 0.94583333
0.94583333 0.94979079 0.94605809 0.94936709]
mean value: 0.9449788903641747
key: test_recall
value: [0.92307692 0.92592593 0.80769231 0.96153846 0.96153846 0.84615385
0.84615385 0.88461538 0.96153846 0.96153846]
mean value: 0.9079772079772079
key: train_recall
value: [0.96170213 0.96153846 0.97021277 0.96170213 0.96595745 0.96595745
0.96595745 0.96595745 0.97021277 0.95744681]
mean value: 0.9646644844517185
key: test_roc_auc
value: [0.88746439 0.86680912 0.88461538 0.94230769 0.92307692 0.90384615
0.80769231 0.90384615 0.86538462 0.84615385]
mean value: 0.8831196581196582
key: train_roc_auc
value: [0.94879978 0.94672668 0.95531915 0.95319149 0.95957447 0.95531915
0.95531915 0.95744681 0.95744681 0.95319149]
mean value: 0.9542334969994545
key: test_jcc
value: [0.8 0.78125 0.77777778 0.89285714 0.86206897 0.81481481
0.6875 0.82142857 0.78125 0.75757576]
mean value: 0.7976523029971305
key: train_jcc
value: [0.904 0.9 0.91566265 0.91129032 0.92276423 0.91532258
0.91532258 0.91902834 0.91935484 0.91093117]
mean value: 0.9133676714995371
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01811981 0.0114007 0.00968623 0.00974464 0.01074147 0.00977039
0.01065302 0.01098585 0.00996947 0.00996566]
mean value: 0.01110372543334961
key: score_time
value: [0.0099473 0.00906754 0.00856543 0.00886488 0.00860167 0.00880289
0.00895095 0.00944519 0.00875688 0.00867391]
mean value: 0.00896766185760498
key: test_mcc
value: [0.6980057 0.51359557 0.81312325 0.84615385 0.84615385 0.70064905
0.65824263 0.65433031 0.73568294 0.6172134 ]
mean value: 0.7083150533360429
key: train_mcc
value: [0.68508531 0.71894691 0.70253486 0.69424587 0.74910575 0.74478875
0.69401929 0.67747959 0.69041892 0.71541847]
mean value: 0.7072043720898497
key: test_accuracy
value: [0.8490566 0.75471698 0.90384615 0.92307692 0.92307692 0.84615385
0.82692308 0.82692308 0.86538462 0.80769231]
mean value: 0.8526850507982584
key: train_accuracy
value: [0.84221748 0.85927505 0.85106383 0.84680851 0.87446809 0.87234043
0.84680851 0.83829787 0.84468085 0.85744681]
mean value: 0.8533407430930454
key: test_fscore
value: [0.84615385 0.74509804 0.89795918 0.92307692 0.92307692 0.83333333
0.81632653 0.82352941 0.87272727 0.8 ]
mean value: 0.8481281463634405
key: train_fscore
value: [0.83913043 0.85652174 0.84848485 0.84347826 0.87311828 0.87124464
0.84415584 0.83406114 0.84026258 0.85466377]
mean value: 0.850512153401787
key: test_precision
value: [0.84615385 0.79166667 0.95652174 0.92307692 0.92307692 0.90909091
0.86956522 0.84 0.82758621 0.83333333]
mean value: 0.8720071764816892
key: train_precision
value: [0.85777778 0.87168142 0.86343612 0.86222222 0.8826087 0.87878788
0.85903084 0.85650224 0.86486486 0.87168142]
mean value: 0.8668593473668214
key: test_recall
value: [0.84615385 0.7037037 0.84615385 0.92307692 0.92307692 0.76923077
0.76923077 0.80769231 0.92307692 0.76923077]
mean value: 0.8280626780626781
key: train_recall
value: [0.8212766 0.84188034 0.83404255 0.82553191 0.86382979 0.86382979
0.82978723 0.81276596 0.81702128 0.83829787]
mean value: 0.8348263320603746
key: test_roc_auc
value: [0.84900285 0.75569801 0.90384615 0.92307692 0.92307692 0.84615385
0.82692308 0.82692308 0.86538462 0.80769231]
mean value: 0.8527777777777779
key: train_roc_auc
value: [0.84226223 0.85923804 0.85106383 0.84680851 0.87446809 0.87234043
0.84680851 0.83829787 0.84468085 0.85744681]
mean value: 0.853341516639389
key: test_jcc
value: [0.73333333 0.59375 0.81481481 0.85714286 0.85714286 0.71428571
0.68965517 0.7 0.77419355 0.66666667]
mean value: 0.7400984964187133
key: train_jcc
value: [0.72284644 0.74904943 0.73684211 0.72932331 0.77480916 0.77186312
0.73033708 0.71535581 0.7245283 0.74621212]
mean value: 0.7401166870309306
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01385999 0.0178628 0.02119446 0.02347088 0.0253675 0.01830435
0.02012086 0.01882529 0.02110195 0.020648 ]
mean value: 0.020075607299804687
key: score_time
value: [0.00991964 0.01136136 0.01177001 0.01195335 0.01180339 0.01197219
0.01181316 0.01186776 0.01188731 0.01246667]
mean value: 0.011681485176086425
key: test_mcc
value: [0.73609205 0.70527596 0.74466871 0.88527041 0.82305489 0.64676167
0.74466871 0.82305489 0.77151675 0.66666667]
mean value: 0.7547030706307085
key: train_mcc
value: [0.86611567 0.88176453 0.88164966 0.91163756 0.87947498 0.84270412
0.86448019 0.76515574 0.91064654 0.74380085]
mean value: 0.8547429840717615
key: test_accuracy
value: [0.86792453 0.8490566 0.86538462 0.94230769 0.90384615 0.80769231
0.86538462 0.90384615 0.88461538 0.80769231]
mean value: 0.8697750362844703
key: train_accuracy
value: [0.93176972 0.94029851 0.94042553 0.95531915 0.93829787 0.91914894
0.92978723 0.87234043 0.95531915 0.85957447]
mean value: 0.9242280996234632
key: test_fscore
value: [0.8627451 0.86206897 0.85106383 0.94117647 0.9122807 0.77272727
0.85106383 0.89361702 0.88888889 0.83870968]
mean value: 0.8674341755785658
key: train_fscore
value: [0.92920354 0.94166667 0.93913043 0.95424837 0.9406953 0.91479821
0.9258427 0.85576923 0.95541401 0.8754717 ]
mean value: 0.9232240148337406
key: test_precision
value: [0.88 0.80645161 0.95238095 0.96 0.83870968 0.94444444
0.95238095 1. 0.85714286 0.72222222]
mean value: 0.8913732718894009
key: train_precision
value: [0.96774194 0.91869919 0.96 0.97767857 0.90551181 0.96682464
0.98095238 0.98342541 0.95338983 0.78644068]
mean value: 0.9400664453269295
key: test_recall
value: [0.84615385 0.92592593 0.76923077 0.92307692 1. 0.65384615
0.76923077 0.80769231 0.92307692 1. ]
mean value: 0.8618233618233618
key: train_recall
value: [0.89361702 0.96581197 0.91914894 0.93191489 0.9787234 0.86808511
0.87659574 0.75744681 0.95744681 0.98723404]
mean value: 0.9136024731769412
key: test_roc_auc
value: [0.86752137 0.84757835 0.86538462 0.94230769 0.90384615 0.80769231
0.86538462 0.90384615 0.88461538 0.80769231]
mean value: 0.8695868945868945
key: train_roc_auc
value: [0.93185125 0.94035279 0.94042553 0.95531915 0.93829787 0.91914894
0.92978723 0.87234043 0.95531915 0.85957447]
mean value: 0.9242416803055101
key: test_jcc
value: [0.75862069 0.75757576 0.74074074 0.88888889 0.83870968 0.62962963
0.74074074 0.80769231 0.8 0.72222222]
mean value: 0.7684820654564815
key: train_jcc
value: [0.8677686 0.88976378 0.8852459 0.9125 0.88803089 0.84297521
0.86192469 0.74789916 0.91463415 0.77852349]
mean value: 0.8589265852981367
MCC on Blind test: 0.71
Accuracy on Blind test: 0.83
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01879811 0.01746774 0.02222395 0.02218533 0.02078748 0.01897097
0.02105546 0.02173853 0.02300715 0.02159905]
mean value: 0.020783376693725587
key: score_time
value: [0.01111412 0.01320291 0.01229048 0.01212955 0.0149734 0.01178837
0.01732373 0.01178002 0.01172805 0.0117414 ]
mean value: 0.012807202339172364
key: test_mcc
value: [0.85164138 0.65110205 0.60697698 0.85634884 0.80829038 0.61494005
0.77849894 0.84866842 0.79056942 0.73131034]
mean value: 0.7538346799690747
key: train_mcc
value: [0.88621044 0.79855158 0.65963501 0.84046667 0.82974725 0.83758899
0.87436938 0.91519196 0.87093638 0.88085106]
mean value: 0.8393548746757886
key: test_accuracy
value: [0.9245283 0.81132075 0.76923077 0.92307692 0.90384615 0.78846154
0.88461538 0.92307692 0.88461538 0.86538462]
mean value: 0.8678156748911466
key: train_accuracy
value: [0.9424307 0.89339019 0.80638298 0.91702128 0.90851064 0.91489362
0.93617021 0.95744681 0.93404255 0.94042553]
mean value: 0.9150714512543665
key: test_fscore
value: [0.92592593 0.83870968 0.7 0.92857143 0.90196078 0.74418605
0.875 0.92 0.89655172 0.86792453]
mean value: 0.8598830115181881
key: train_fscore
value: [0.94409938 0.9015748 0.76240209 0.92184369 0.8997669 0.9086758
0.9339207 0.95689655 0.93660532 0.94042553]
mean value: 0.9106210762491109
key: test_precision
value: [0.89285714 0.74285714 1. 0.86666667 0.92 0.94117647
0.95454545 0.95833333 0.8125 0.85185185]
mean value: 0.8940788062699827
key: train_precision
value: [0.91935484 0.83576642 0.98648649 0.87121212 0.99484536 0.98029557
0.96803653 0.96943231 0.9015748 0.94042553]
mean value: 0.93674299762485
key: test_recall
value: [0.96153846 0.96296296 0.53846154 1. 0.88461538 0.61538462
0.80769231 0.88461538 1. 0.88461538]
mean value: 0.853988603988604
key: train_recall
value: [0.97021277 0.97863248 0.6212766 0.9787234 0.8212766 0.84680851
0.90212766 0.94468085 0.97446809 0.94042553]
mean value: 0.8978632478632479
key: test_roc_auc
value: [0.92521368 0.80840456 0.76923077 0.92307692 0.90384615 0.78846154
0.88461538 0.92307692 0.88461538 0.86538462]
mean value: 0.8675925925925926
key: train_roc_auc
value: [0.94237134 0.89357156 0.80638298 0.91702128 0.90851064 0.91489362
0.93617021 0.95744681 0.93404255 0.94042553]
mean value: 0.9150836515730132
key: test_jcc
value: [0.86206897 0.72222222 0.53846154 0.86666667 0.82142857 0.59259259
0.77777778 0.85185185 0.8125 0.76666667]
mean value: 0.7612236853185129
key: train_jcc
value: [0.89411765 0.82078853 0.61603376 0.85501859 0.81779661 0.83263598
0.87603306 0.91735537 0.88076923 0.8875502 ]
mean value: 0.839809897491723
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18865156 0.17809844 0.17845082 0.18266034 0.1782515 0.17729211
0.17885733 0.17905545 0.18050551 0.18259931]
mean value: 0.1804422378540039
key: score_time
value: [0.01532364 0.01549387 0.01535821 0.01534915 0.01540208 0.01566911
0.01548195 0.01556087 0.01541352 0.01596475]
mean value: 0.015501713752746582
key: test_mcc
value: [0.8116984 0.92450142 0.96225045 0.96225045 0.9258201 0.96225045
0.81312325 0.9258201 0.96225045 0.92307692]
mean value: 0.9173041989812138
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90566038 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923
0.90384615 0.96153846 0.98076923 0.96153846]
mean value: 0.9579462989840348
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90196078 0.96296296 0.98039216 0.98113208 0.96296296 0.98039216
0.89795918 0.96 0.98113208 0.96153846]
mean value: 0.9570432820120469
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92 0.96296296 1. 0.96296296 0.92857143 1.
0.95652174 1. 0.96296296 0.96153846]
mean value: 0.9655520518129214
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88461538 0.96296296 0.96153846 1. 1. 0.96153846
0.84615385 0.92307692 1. 0.96153846]
mean value: 0.9501424501424501
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90527066 0.96225071 0.98076923 0.98076923 0.96153846 0.98076923
0.90384615 0.96153846 0.98076923 0.96153846]
mean value: 0.957905982905983
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82142857 0.92857143 0.96153846 0.96296296 0.92857143 0.96153846
0.81481481 0.92307692 0.96296296 0.92592593]
mean value: 0.9191391941391941
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06805992 0.06910896 0.06634355 0.0792799 0.0696907 0.07058072
0.088202 0.07411814 0.07483792 0.07089567]
mean value: 0.07311174869537354
key: score_time
value: [0.02007461 0.0277791 0.03617716 0.03125453 0.03761816 0.03880787
0.03588963 0.03868651 0.02243757 0.03866386]
mean value: 0.032738900184631346
key: test_mcc
value: [0.85164138 0.96291111 0.9258201 0.96225045 0.9258201 0.96225045
0.84866842 0.88527041 0.89056356 0.96225045]
mean value: 0.9177446427850156
key: train_mcc
value: [0.98721586 0.98721563 0.9873145 0.97873227 0.9957537 0.9873145
0.98724298 0.99152527 0.97478586 0.98297872]
mean value: 0.9860079271688947
key: test_accuracy
value: [0.9245283 0.98113208 0.96153846 0.98076923 0.96153846 0.98076923
0.92307692 0.94230769 0.94230769 0.98076923]
mean value: 0.9578737300435414
key: train_accuracy
value: [0.99360341 0.99360341 0.99361702 0.9893617 0.99787234 0.99361702
0.99361702 0.99574468 0.98723404 0.99148936]
mean value: 0.9929760014517081
key: test_fscore
value: [0.92592593 0.98181818 0.96 0.98113208 0.96296296 0.98039216
0.92 0.94339623 0.94545455 0.98113208]
mean value: 0.9582214150382852
key: train_fscore
value: [0.99360341 0.99357602 0.99357602 0.98933902 0.9978678 0.99357602
0.99363057 0.9957265 0.98739496 0.99148936]
mean value: 0.9929779674593665
key: test_precision
value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1.
0.95833333 0.92592593 0.89655172 0.96296296]
mean value: 0.9492451195037402
key: train_precision
value: [0.9957265 0.99570815 1. 0.99145299 1. 1.
0.99152542 1. 0.97510373 0.99148936]
mean value: 0.99410061615567
key: test_recall
value: [0.96153846 1. 0.92307692 1. 1. 0.96153846
0.88461538 0.96153846 1. 1. ]
mean value: 0.9692307692307692
key: train_recall
value: [0.99148936 0.99145299 0.98723404 0.98723404 0.99574468 0.98723404
0.99574468 0.99148936 1. 0.99148936]
mean value: 0.9919112565921077
key: test_roc_auc
value: [0.92521368 0.98076923 0.96153846 0.98076923 0.96153846 0.98076923
0.92307692 0.94230769 0.94230769 0.98076923]
mean value: 0.957905982905983
key: train_roc_auc
value: [0.99360793 0.99359884 0.99361702 0.9893617 0.99787234 0.99361702
0.99361702 0.99574468 0.98723404 0.99148936]
mean value: 0.9929759956355702
key: test_jcc
value: [0.86206897 0.96428571 0.92307692 0.96296296 0.92857143 0.96153846
0.85185185 0.89285714 0.89655172 0.96296296]
mean value: 0.9206728137762621
key: train_jcc
value: [0.98728814 0.98723404 0.98723404 0.97890295 0.99574468 0.98723404
0.98734177 0.99148936 0.97510373 0.98312236]
mean value: 0.9860695128853415
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16664958 0.13899064 0.1713109 0.12175751 0.12432766 0.15624762
0.1744194 0.12778425 0.15947342 0.16238427]
mean value: 0.1503345251083374
key: score_time
value: [0.02422047 0.01502085 0.02464485 0.02465963 0.01550269 0.01529956
0.02474427 0.02651215 0.02499962 0.02835417]
mean value: 0.022395825386047362
key: test_mcc
value: [0.6980057 0.54700855 0.50336201 0.77151675 0.63245553 0.65433031
0.76923077 0.81312325 0.73131034 0.50037023]
mean value: 0.662071343296795
key: train_mcc
value: [0.98728791 0.99150708 0.9873145 0.9873145 0.9873145 0.99152527
0.9873145 0.9873145 0.9873145 0.9873145 ]
mean value: 0.9881521740698564
key: test_accuracy
value: [0.8490566 0.77358491 0.75 0.88461538 0.80769231 0.82692308
0.88461538 0.90384615 0.86538462 0.75 ]
mean value: 0.8295718432510886
key: train_accuracy
value: [0.99360341 0.99573561 0.99361702 0.99361702 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940402848977
key: test_fscore
value: [0.84615385 0.77777778 0.73469388 0.88 0.82758621 0.82352941
0.88461538 0.89795918 0.86792453 0.74509804]
mean value: 0.8285338255950329
key: train_fscore
value: [0.99357602 0.99570815 0.99357602 0.99357602 0.99357602 0.9957265
0.99357602 0.99357602 0.99357602 0.99357602]
mean value: 0.9940042787277901
key: test_precision
value: [0.84615385 0.77777778 0.7826087 0.91666667 0.75 0.84
0.88461538 0.95652174 0.85185185 0.76 ]
mean value: 0.8366195961848135
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84615385 0.77777778 0.69230769 0.84615385 0.92307692 0.80769231
0.88461538 0.84615385 0.88461538 0.73076923]
mean value: 0.8239316239316239
key: train_recall
value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
key: test_roc_auc
value: [0.84900285 0.77350427 0.75 0.88461538 0.80769231 0.82692308
0.88461538 0.90384615 0.86538462 0.75 ]
mean value: 0.8295584045584046
key: train_roc_auc
value: [0.99361702 0.9957265 0.99361702 0.99361702 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940407346790325
key: test_jcc
value: [0.73333333 0.63636364 0.58064516 0.78571429 0.70588235 0.7
0.79310345 0.81481481 0.76666667 0.59375 ]
mean value: 0.7110273699400098
key: train_jcc
value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.7296083 0.72827125 0.7328434 0.73026657 0.72733808 0.73085546
0.72294044 0.72570992 0.74692559 0.73820591]
mean value: 0.7312964916229248
key: score_time
value: [0.00976491 0.00955319 0.00944519 0.00955868 0.00969505 0.00937343
0.00935888 0.00958729 0.01031232 0.00962877]
mean value: 0.009627771377563477
key: test_mcc
value: [0.85164138 0.92704716 0.96225045 0.96225045 0.9258201 0.96225045
0.88527041 0.92307692 0.9258201 0.96225045]
mean value: 0.9287677874797963
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.96153846 0.98076923]
mean value: 0.9636792452830188
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 0.96428571 0.98039216 0.98113208 0.96296296 0.98039216
0.94117647 0.96153846 0.96296296 0.98113208]
mean value: 0.964190096293315
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.93103448 1. 0.96296296 0.92857143 1.
0.96 0.96153846 0.92857143 0.96296296]
mean value: 0.9528498870223008
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.92307692 0.96153846 1. 1. ]
mean value: 0.9769230769230769
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 0.96153846 0.98076923 0.98076923 0.96153846 0.98076923
0.94230769 0.96153846 0.96153846 0.98076923]
mean value: 0.9636752136752137
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 0.93103448 0.96153846 0.96296296 0.92857143 0.96153846
0.88888889 0.92592593 0.92857143 0.96296296]
mean value: 0.9314063969236382
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03424716 0.03178167 0.03212857 0.05195475 0.05340314 0.04832959
0.04696679 0.03173852 0.0314827 0.03206182]
mean value: 0.0394094705581665
key: score_time
value: [0.01278758 0.0395515 0.03736591 0.0196743 0.02260709 0.01882935
0.01456594 0.01464891 0.01470041 0.02665329]
mean value: 0.022138428688049317
key: test_mcc
value: [0.54700855 0.25905207 0.62279916 0.34684399 0.33333333 0.53846154
0.51916999 0.18257419 0.63245553 0.31622777]
mean value: 0.4297926104204046
key: train_mcc
value: [0.95006652 0.73515544 0.88334763 0.5920935 0.83105203 0.97880317
0.93009643 0.6846532 0.96191988 0.85319469]
mean value: 0.840038248069776
key: test_accuracy
value: [0.77358491 0.62264151 0.80769231 0.65384615 0.65384615 0.76923077
0.75 0.57692308 0.80769231 0.65384615]
mean value: 0.7069303338171262
key: train_accuracy
value: [0.97441365 0.85074627 0.93829787 0.75957447 0.90851064 0.9893617
0.96382979 0.81914894 0.98085106 0.9212766 ]
mean value: 0.9106010978541941
key: test_fscore
value: [0.76923077 0.6875 0.82142857 0.71875 0.70967742 0.76923077
0.77966102 0.66666667 0.82758621 0.68965517]
mean value: 0.7439386592171113
key: train_fscore
value: [0.97510373 0.86988848 0.94188377 0.80617496 0.91617934 0.98929336
0.9650924 0.84684685 0.98105263 0.9270217 ]
mean value: 0.9218537211188351
key: test_precision
value: [0.76923077 0.59459459 0.76666667 0.60526316 0.61111111 0.76923077
0.6969697 0.55 0.75 0.625 ]
mean value: 0.6738066765698345
key: train_precision
value: [0.951417 0.76973684 0.89015152 0.67528736 0.84532374 0.99568966
0.93253968 0.734375 0.97083333 0.86397059]
mean value: 0.8629324717915119
key: test_recall
value: [0.76923077 0.81481481 0.88461538 0.88461538 0.84615385 0.76923077
0.88461538 0.84615385 0.92307692 0.76923077]
mean value: 0.8391737891737892
key: train_recall
value: [1. 1. 1. 1. 1. 0.98297872
1. 1. 0.99148936 1. ]
mean value: 0.9974468085106383
key: test_roc_auc
value: [0.77350427 0.61894587 0.80769231 0.65384615 0.65384615 0.76923077
0.75 0.57692308 0.80769231 0.65384615]
mean value: 0.7065527065527065
key: train_roc_auc
value: [0.97435897 0.85106383 0.93829787 0.75957447 0.90851064 0.9893617
0.96382979 0.81914894 0.98085106 0.9212766 ]
mean value: 0.9106273867975996
key: test_jcc
value: [0.625 0.52380952 0.6969697 0.56097561 0.55 0.625
0.63888889 0.5 0.70588235 0.52631579]
mean value: 0.5952841861839068
key: train_jcc
value: [0.951417 0.76973684 0.89015152 0.67528736 0.84532374 0.97881356
0.93253968 0.734375 0.96280992 0.86397059]
mean value: 0.8604425206086777
MCC on Blind test: 0.41
Accuracy on Blind test: 0.67
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02965641 0.03908777 0.03907251 0.03891897 0.03882527 0.03877044
0.03892875 0.03883457 0.03862453 0.0391221 ]
mean value: 0.03798413276672363
key: score_time
value: [0.01900005 0.01908803 0.01899123 0.01887655 0.01898837 0.01889229
0.01906919 0.0189023 0.0188899 0.01907778]
mean value: 0.01897757053375244
key: test_mcc
value: [0.85164138 0.73997003 0.77849894 0.92307692 0.89056356 0.74466871
0.73131034 0.88527041 0.81312325 0.77849894]
mean value: 0.8136622485831022
key: train_mcc
value: [0.85528213 0.86366944 0.85168866 0.85544308 0.85535013 0.85581519
0.8769849 0.86847048 0.86411148 0.85107154]
mean value: 0.8597887013936425
key: test_accuracy
value: [0.9245283 0.86792453 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.904245283018868
key: train_accuracy
value: [0.92750533 0.93176972 0.92553191 0.92765957 0.92765957 0.92765957
0.93829787 0.93404255 0.93191489 0.92553191]
mean value: 0.929757292564533
key: test_fscore
value: [0.92592593 0.87719298 0.875 0.96153846 0.94545455 0.85106383
0.86792453 0.94117647 0.90909091 0.89285714]
mean value: 0.9047224796000481
key: train_fscore
value: [0.92857143 0.93220339 0.92693111 0.92827004 0.9279661 0.92887029
0.93920335 0.93501048 0.93277311 0.92569002]
mean value: 0.9305489328602898
key: test_precision
value: [0.89285714 0.83333333 0.95454545 0.96153846 0.89655172 0.95238095
0.85185185 0.96 0.86206897 0.83333333]
mean value: 0.8998461219495703
key: train_precision
value: [0.91701245 0.92436975 0.90983607 0.92050209 0.92405063 0.91358025
0.92561983 0.9214876 0.92116183 0.92372881]
mean value: 0.9201349310782884
key: test_recall
value: [0.96153846 0.92592593 0.80769231 0.96153846 1. 0.76923077
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.9156695156695157
key: train_recall
value: [0.94042553 0.94017094 0.94468085 0.93617021 0.93191489 0.94468085
0.95319149 0.94893617 0.94468085 0.92765957]
mean value: 0.9412511365702856
key: test_roc_auc
value: [0.92521368 0.86680912 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.9042022792022792
key: train_roc_auc
value: [0.92747772 0.9317876 0.92553191 0.92765957 0.92765957 0.92765957
0.93829787 0.93404255 0.93191489 0.92553191]
mean value: 0.9297563193307874
key: test_jcc
value: [0.86206897 0.78125 0.77777778 0.92592593 0.89655172 0.74074074
0.76666667 0.88888889 0.83333333 0.80645161]
mean value: 0.8279655635891732
key: train_jcc
value: [0.86666667 0.87301587 0.86381323 0.86614173 0.86561265 0.8671875
0.88537549 0.87795276 0.87401575 0.86166008]
mean value: 0.870144172681887
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.27485728 0.28193116 0.35548592 0.28141141 0.28373504 0.31686974
0.28387547 0.28604722 0.28356791 0.33537459]
mean value: 0.29831557273864745
key: score_time
value: [0.01901937 0.01924253 0.01896763 0.01907969 0.01897073 0.01907015
0.01907754 0.01901555 0.01911354 0.01900244]
mean value: 0.019055914878845216
key: test_mcc
value: [0.85164138 0.62867836 0.77849894 0.92307692 0.89056356 0.74466871
0.73131034 0.88527041 0.81312325 0.77849894]
mean value: 0.8025330823545165
key: train_mcc
value: [0.85528213 0.80817284 0.80498447 0.85544308 0.85535013 0.85581519
0.8769849 0.86847048 0.86411148 0.85107154]
mean value: 0.8495686235274289
key: test_accuracy
value: [0.9245283 0.81132075 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.8985849056603774
key: train_accuracy
value: [0.92750533 0.90405117 0.90212766 0.92765957 0.92765957 0.92765957
0.93829787 0.93404255 0.93191489 0.92553191]
mean value: 0.924645012021957
key: test_fscore
value: [0.92592593 0.82758621 0.875 0.96153846 0.94545455 0.85106383
0.86792453 0.94117647 0.90909091 0.89285714]
mean value: 0.8997618020440893
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.92857143 0.9044586 0.90416667 0.92827004 0.9279661 0.92887029
0.93920335 0.93501048 0.93277311 0.92569002]
mean value: 0.9254980097693355
key: test_precision
value: [0.89285714 0.77419355 0.95454545 0.96153846 0.89655172 0.95238095
0.85185185 0.96 0.86206897 0.83333333]
mean value: 0.8939321434549465
key: train_precision
value: [0.91701245 0.89873418 0.88571429 0.92050209 0.92405063 0.91358025
0.92561983 0.9214876 0.92116183 0.92372881]
mean value: 0.9151591960239429
key: test_recall
value: [0.96153846 0.88888889 0.80769231 0.96153846 1. 0.76923077
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.911965811965812
key: train_recall
value: [0.94042553 0.91025641 0.92340426 0.93617021 0.93191489 0.94468085
0.95319149 0.94893617 0.94468085 0.92765957]
mean value: 0.9361320240043645
key: test_roc_auc
value: [0.92521368 0.80982906 0.88461538 0.96153846 0.94230769 0.86538462
0.86538462 0.94230769 0.90384615 0.88461538]
mean value: 0.8985042735042735
key: train_roc_auc
value: [0.92747772 0.90406438 0.90212766 0.92765957 0.92765957 0.92765957
0.93829787 0.93404255 0.93191489 0.92553191]
mean value: 0.9246435715584652
key: test_jcc
value: [0.86206897 0.70588235 0.77777778 0.92592593 0.89655172 0.74074074
0.76666667 0.88888889 0.83333333 0.80645161]
mean value: 0.8204287988832908
key: train_jcc
value: [0.86666667 0.8255814 0.82509506 0.86614173 0.86561265 0.8671875
0.88537549 0.87795276 0.87401575 0.86166008]
mean value: 0.861528907661407
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03136301 0.03624725 0.03802299 0.0339272 0.03786993 0.03451753
0.03579712 0.0273068 0.03700662 0.03485203]
mean value: 0.03469104766845703
key: score_time
value: [0.01213574 0.01400828 0.01402617 0.01210189 0.01214027 0.01211405
0.0120852 0.01208687 0.01415348 0.01223993]
mean value: 0.01270918846130371
key: test_mcc
value: [0.8459178 0.92427578 0.85407434 0.84544958 0.80461538 0.76662339
0.80431528 0.88289781 0.72057669 0.68 ]
mean value: 0.8128746057870775
key: train_mcc
value: [0.85120279 0.85123255 0.86870834 0.86453248 0.85558875 0.86874413
0.85565707 0.8690155 0.86912823 0.87776273]
mean value: 0.863157258117954
key: test_accuracy
value: [0.92156863 0.96078431 0.92156863 0.92156863 0.90196078 0.88235294
0.90196078 0.94117647 0.86 0.84 ]
mean value: 0.9052941176470588
key: train_accuracy
value: [0.92560175 0.92560175 0.93435449 0.9321663 0.92778993 0.93435449
0.92778993 0.93435449 0.93449782 0.93886463]
mean value: 0.9315375574517691
key: test_fscore
value: [0.92307692 0.95833333 0.92592593 0.91666667 0.90196078 0.88888889
0.90566038 0.94339623 0.85714286 0.84 ]
mean value: 0.9061051983121906
key: train_fscore
value: [0.92576419 0.92608696 0.93449782 0.93304536 0.92778993 0.93449782
0.92810458 0.93506494 0.93506494 0.93913043]
mean value: 0.9319046952651103
key: test_precision
value: [0.88888889 1. 0.86206897 0.95652174 0.92 0.85714286
0.88888889 0.92592593 0.875 0.84 ]
mean value: 0.9014437265494237
key: train_precision
value: [0.92576419 0.92207792 0.93449782 0.92307692 0.92576419 0.93043478
0.92207792 0.92307692 0.92703863 0.93506494]
mean value: 0.9268874235466126
key: test_recall
value: [0.96 0.92 1. 0.88 0.88461538 0.92307692
0.92307692 0.96153846 0.84 0.84 ]
mean value: 0.9132307692307693
key: train_recall
value: [0.92576419 0.930131 0.93449782 0.94323144 0.92982456 0.93859649
0.93421053 0.94736842 0.94323144 0.94323144]
mean value: 0.9370087336244541
key: test_roc_auc
value: [0.92230769 0.96 0.92307692 0.92076923 0.90230769 0.88153846
0.90153846 0.94076923 0.86 0.84 ]
mean value: 0.9052307692307693
key: train_roc_auc
value: [0.92560139 0.92559182 0.93435417 0.93214204 0.92779438 0.93436375
0.92780395 0.9343829 0.93449782 0.93886463]
mean value: 0.9315396843637478
key: test_jcc
value: [0.85714286 0.92 0.86206897 0.84615385 0.82142857 0.8
0.82758621 0.89285714 0.75 0.72413793]
mean value: 0.8301375521030694
key: train_jcc
value: [0.86178862 0.86234818 0.87704918 0.87449393 0.86530612 0.87704918
0.86585366 0.87804878 0.87804878 0.8852459 ]
mean value: 0.8725232327405593
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.08443904 0.96518111 0.84693241 0.97842479 0.87160254 0.96375942
0.91595888 0.93315125 0.96591616 0.96043491]
mean value: 0.9485800504684448
key: score_time
value: [0.01489997 0.01467752 0.01478934 0.02118349 0.01491809 0.01476455
0.01487756 0.01552033 0.01489472 0.01519156]
mean value: 0.0155717134475708
key: test_mcc
value: [0.8459178 0.88289781 0.88872671 0.80904133 0.88307692 0.76461538
0.80431528 0.92153846 0.76 0.6821865 ]
mean value: 0.8242316201242978
key: train_mcc
value: [0.90375223 0.89066391 0.91247223 0.95627191 0.88184708 0.90375591
0.89059986 0.89956325 0.90406806 0.96510231]
mean value: 0.9108096761378437
key: test_accuracy
value: [0.92156863 0.94117647 0.94117647 0.90196078 0.94117647 0.88235294
0.90196078 0.96078431 0.88 0.84 ]
mean value: 0.9112156862745098
key: train_accuracy
value: [0.95185996 0.9452954 0.95623632 0.97811816 0.94091904 0.95185996
0.9452954 0.94967177 0.95196507 0.98253275]
mean value: 0.9553753834099357
key: test_fscore
value: [0.92307692 0.93877551 0.94339623 0.89361702 0.94117647 0.88461538
0.90566038 0.96153846 0.88 0.83333333]
mean value: 0.91051897084066
key: train_fscore
value: [0.95217391 0.94577007 0.95633188 0.97826087 0.94091904 0.95196507
0.9452954 0.95010846 0.95238095 0.98245614]
mean value: 0.9555661785530866
key: test_precision
value: [0.88888889 0.95833333 0.89285714 0.95454545 0.96 0.88461538
0.88888889 0.96153846 0.88 0.86956522]
mean value: 0.9139232772058858
key: train_precision
value: [0.94805195 0.93965517 0.95633188 0.97402597 0.93886463 0.94782609
0.94323144 0.93991416 0.94420601 0.98678414]
mean value: 0.9518891441689473
key: test_recall
value: [0.96 0.92 1. 0.84 0.92307692 0.88461538
0.92307692 0.96153846 0.88 0.8 ]
mean value: 0.9092307692307693
key: train_recall
value: [0.95633188 0.95196507 0.95633188 0.98253275 0.94298246 0.95614035
0.94736842 0.96052632 0.96069869 0.97816594]
mean value: 0.9593043744733012
key: test_roc_auc
value: [0.92230769 0.94076923 0.94230769 0.90076923 0.94153846 0.88230769
0.90153846 0.96076923 0.88 0.84 ]
mean value: 0.9112307692307692
key: train_roc_auc
value: [0.95185015 0.94528078 0.95623611 0.97810848 0.94092354 0.9518693
0.94529993 0.94969547 0.95196507 0.98253275]
mean value: 0.955376158737455
key: test_jcc
value: [0.85714286 0.88461538 0.89285714 0.80769231 0.88888889 0.79310345
0.82758621 0.92592593 0.78571429 0.71428571]
mean value: 0.8377812162294921
key: train_jcc
value: [0.90871369 0.89711934 0.91631799 0.95744681 0.88842975 0.90833333
0.89626556 0.90495868 0.90909091 0.96551724]
mean value: 0.9152193308373876
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01459789 0.01174212 0.01132417 0.01122999 0.0112493 0.00993657
0.00992107 0.01049495 0.00995612 0.01000881]
mean value: 0.011046099662780761
key: score_time
value: [0.01251364 0.01037288 0.00981545 0.0099647 0.0097928 0.00900888
0.00895166 0.00888586 0.0090785 0.00888777]
mean value: 0.009727215766906739
key: test_mcc
value: [0.7531751 0.6938347 0.64715023 0.64769231 0.70728397 0.60769231
0.77353193 0.5372904 0.68 0.61806423]
mean value: 0.6665715181657266
key: train_mcc
value: [0.7051679 0.68663317 0.70105568 0.71111913 0.69491764 0.71441791
0.68598516 0.69626736 0.71979689 0.71353415]
mean value: 0.7028894982851523
key: test_accuracy
value: [0.8627451 0.84313725 0.82352941 0.82352941 0.84313725 0.80392157
0.88235294 0.76470588 0.84 0.8 ]
mean value: 0.8287058823529412
key: train_accuracy
value: [0.8512035 0.84245077 0.84901532 0.85339168 0.84682713 0.85557987
0.84026258 0.84682713 0.8580786 0.8558952 ]
mean value: 0.8499531785997535
key: test_fscore
value: [0.8372093 0.82608696 0.81632653 0.82352941 0.82608696 0.80769231
0.89285714 0.75 0.84 0.77272727]
mean value: 0.8192515881022734
key: train_fscore
value: [0.84474886 0.83710407 0.84210526 0.84526559 0.84162896 0.84792627
0.82903981 0.83944954 0.85057471 0.85067873]
mean value: 0.8428521809081373
key: test_precision
value: [1. 0.9047619 0.83333333 0.80769231 0.95 0.80769231
0.83333333 0.81818182 0.84 0.89473684]
mean value: 0.8689731847100268
key: train_precision
value: [0.88516746 0.8685446 0.88461538 0.89705882 0.86915888 0.89320388
0.88944724 0.87980769 0.89805825 0.88262911]
mean value: 0.8847691324095417
key: test_recall
value: [0.72 0.76 0.8 0.84 0.73076923 0.80769231
0.96153846 0.69230769 0.84 0.68 ]
mean value: 0.7832307692307692
key: train_recall
value: [0.80786026 0.80786026 0.80349345 0.79912664 0.81578947 0.80701754
0.77631579 0.80263158 0.80786026 0.8209607 ]
mean value: 0.8048915958017314
key: test_roc_auc
value: [0.86 0.84153846 0.82307692 0.82384615 0.84538462 0.80384615
0.88076923 0.76615385 0.84 0.8 ]
mean value: 0.8284615384615385
key: train_roc_auc
value: [0.85129855 0.84252662 0.84911515 0.85351069 0.84675937 0.85547384
0.84012296 0.84673064 0.8580786 0.8558952 ]
mean value: 0.8499511606527235
key: test_jcc
value: [0.72 0.7037037 0.68965517 0.7 0.7037037 0.67741935
0.80645161 0.6 0.72413793 0.62962963]
mean value: 0.6954701108227248
key: train_jcc
value: [0.7312253 0.71984436 0.72727273 0.732 0.7265625 0.736
0.708 0.72332016 0.74 0.74015748]
mean value: 0.7284382520109796
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01254296 0.01074314 0.0115149 0.01157999 0.01032472 0.01010489
0.01028371 0.01140451 0.01022911 0.01108527]
mean value: 0.010981321334838867
key: score_time
value: [0.00986099 0.00936627 0.00975776 0.00993228 0.00894356 0.0091629
0.00905657 0.00938177 0.00915146 0.01000309]
mean value: 0.009461665153503418
key: test_mcc
value: [0.88289781 0.62355907 0.77487835 0.68875274 0.72615385 0.72573276
0.68779719 0.61017022 0.60783067 0.72524067]
mean value: 0.705301332855258
key: train_mcc
value: [0.75071367 0.74619319 0.75930821 0.77253746 0.74212413 0.74619319
0.72014338 0.75504732 0.77747792 0.76862491]
mean value: 0.7538363372752251
key: test_accuracy
value: [0.94117647 0.80392157 0.88235294 0.84313725 0.8627451 0.8627451
0.84313725 0.80392157 0.8 0.86 ]
mean value: 0.8503137254901961
key: train_accuracy
value: [0.87527352 0.87308534 0.87964989 0.88621444 0.87089716 0.87308534
0.85995624 0.87746171 0.88864629 0.88427948]
mean value: 0.876854939657726
key: test_fscore
value: [0.93877551 0.77272727 0.88888889 0.84615385 0.8627451 0.86792453
0.85185185 0.8 0.81481481 0.85106383]
mean value: 0.8494945640769093
key: train_fscore
value: [0.87688985 0.87391304 0.87964989 0.88744589 0.86859688 0.8722467
0.85777778 0.87826087 0.88984881 0.88503254]
mean value: 0.8769662245721188
key: test_precision
value: [0.95833333 0.89473684 0.82758621 0.81481481 0.88 0.85185185
0.82142857 0.83333333 0.75862069 0.90909091]
mean value: 0.8549796552509801
key: train_precision
value: [0.86752137 0.87012987 0.88157895 0.87982833 0.88235294 0.87610619
0.86936937 0.87068966 0.88034188 0.87931034]
mean value: 0.8757228896777902
key: test_recall
value: [0.92 0.68 0.96 0.88 0.84615385 0.88461538
0.88461538 0.76923077 0.88 0.8 ]
mean value: 0.8504615384615385
key: train_recall
value: [0.88646288 0.87772926 0.87772926 0.89519651 0.85526316 0.86842105
0.84649123 0.88596491 0.89956332 0.89082969]
mean value: 0.878365126790776
key: test_roc_auc
value: [0.94076923 0.80153846 0.88384615 0.84384615 0.86307692 0.86230769
0.84230769 0.80461538 0.8 0.86 ]
mean value: 0.8502307692307692
key: train_roc_auc
value: [0.87524898 0.87307516 0.8796541 0.88619474 0.87086302 0.87307516
0.85992684 0.87748027 0.88864629 0.88427948]
mean value: 0.8768444035853827
key: test_jcc
value: [0.88461538 0.62962963 0.8 0.73333333 0.75862069 0.76666667
0.74193548 0.66666667 0.6875 0.74074074]
mean value: 0.7409708595178561
key: train_jcc
value: [0.78076923 0.77606178 0.78515625 0.79766537 0.76771654 0.7734375
0.75097276 0.78294574 0.80155642 0.79377432]
mean value: 0.7810055900293517
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.0107193 0.01060081 0.01052904 0.01056099 0.01059866 0.01077914
0.0097878 0.00977492 0.00972342 0.00962353]
mean value: 0.010269761085510254
key: score_time
value: [0.01285338 0.01341796 0.01517129 0.01273727 0.01308703 0.0128603
0.01233506 0.01229858 0.01202655 0.01252079]
mean value: 0.012930822372436524
key: test_mcc
value: [0.72984534 0.2668549 0.61017022 0.49076923 0.48998517 0.68875274
0.61017022 0.41140265 0.6 0.5 ]
mean value: 0.539795046786689
key: train_mcc
value: [0.69010909 0.70240558 0.68928004 0.69803298 0.72889968 0.71606598
0.68082181 0.72482631 0.70759226 0.72995395]
mean value: 0.7067987676078814
key: test_accuracy
value: [0.8627451 0.62745098 0.80392157 0.74509804 0.74509804 0.84313725
0.80392157 0.70588235 0.8 0.74 ]
mean value: 0.7677254901960785
key: train_accuracy
value: [0.84463895 0.8512035 0.84463895 0.84901532 0.8643326 0.85776805
0.84026258 0.86214442 0.85371179 0.86462882]
mean value: 0.8532344987721326
key: test_fscore
value: [0.85106383 0.53658537 0.80769231 0.74509804 0.75471698 0.84
0.8 0.71698113 0.8 0.69767442]
mean value: 0.7549812074361085
key: train_fscore
value: [0.84116331 0.85152838 0.8453159 0.8496732 0.86222222 0.85458613
0.83741648 0.8590604 0.85209713 0.86160714]
mean value: 0.8514670310824969
key: test_precision
value: [0.90909091 0.6875 0.77777778 0.73076923 0.74074074 0.875
0.83333333 0.7037037 0.8 0.83333333]
mean value: 0.7891249028749029
key: train_precision
value: [0.86238532 0.85152838 0.84347826 0.84782609 0.87387387 0.87214612
0.85067873 0.87671233 0.86160714 0.88127854]
mean value: 0.8621514789270541
key: test_recall
value: [0.8 0.44 0.84 0.76 0.76923077 0.80769231
0.76923077 0.73076923 0.8 0.6 ]
mean value: 0.7316923076923078
key: train_recall
value: [0.8209607 0.85152838 0.84716157 0.85152838 0.85087719 0.8377193
0.8245614 0.84210526 0.84279476 0.84279476]
mean value: 0.8412031716846702
key: test_roc_auc
value: [0.86153846 0.62384615 0.80461538 0.74538462 0.74461538 0.84384615
0.80461538 0.70538462 0.8 0.74 ]
mean value: 0.7673846153846153
key: train_roc_auc
value: [0.84469088 0.85120279 0.84463342 0.84900981 0.86430323 0.85772428
0.8402283 0.86210067 0.85371179 0.86462882]
mean value: 0.8532233969202482
key: test_jcc
value: [0.74074074 0.36666667 0.67741935 0.59375 0.60606061 0.72413793
0.66666667 0.55882353 0.66666667 0.53571429]
mean value: 0.613664644780059
key: train_jcc
value: [0.72586873 0.74144487 0.73207547 0.73863636 0.7578125 0.74609375
0.72030651 0.75294118 0.74230769 0.75686275]
mean value: 0.7414349805409636
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02395844 0.01976371 0.02095556 0.02216721 0.02027059 0.0211978
0.01987791 0.02031898 0.02200055 0.01978111]
mean value: 0.021029186248779298
key: score_time
value: [0.01175141 0.0115068 0.01230741 0.01178312 0.01150537 0.01177835
0.01278973 0.01174521 0.0115788 0.0126636 ]
mean value: 0.011940979957580566
key: test_mcc
value: [0.8459178 0.88823731 0.85407434 0.80431528 0.80461538 0.80431528
0.80461538 0.80431528 0.76 0.64051262]
mean value: 0.8010918682007854
key: train_mcc
value: [0.79431931 0.7943723 0.79881623 0.80306832 0.80307209 0.80746615
0.80307209 0.80306832 0.81223482 0.81662503]
mean value: 0.8036114663208501
key: test_accuracy
value: [0.92156863 0.94117647 0.92156863 0.90196078 0.90196078 0.90196078
0.90196078 0.90196078 0.88 0.82 ]
mean value: 0.8994117647058824
key: train_accuracy
value: [0.89715536 0.89715536 0.89934354 0.90153173 0.90153173 0.90371991
0.90153173 0.90153173 0.90611354 0.90829694]
mean value: 0.9017911574441249
key: test_fscore
value: [0.92307692 0.93617021 0.92592593 0.89795918 0.90196078 0.90566038
0.90196078 0.90566038 0.88 0.81632653]
mean value: 0.8994701099398953
key: train_fscore
value: [0.89715536 0.89804772 0.89867841 0.90196078 0.90153173 0.9030837
0.90153173 0.9010989 0.90631808 0.90869565]
mean value: 0.9018102075636133
key: test_precision
value: [0.88888889 1. 0.86206897 0.91666667 0.92 0.88888889
0.92 0.88888889 0.88 0.83333333]
mean value: 0.8998735632183907
key: train_precision
value: [0.89912281 0.89224138 0.90666667 0.9 0.89956332 0.90707965
0.89956332 0.9030837 0.90434783 0.9047619 ]
mean value: 0.901643056785623
key: test_recall
value: [0.96 0.88 1. 0.88 0.88461538 0.92307692
0.88461538 0.92307692 0.88 0.8 ]
mean value: 0.9015384615384615
key: train_recall
value: [0.89519651 0.90393013 0.89082969 0.90393013 0.90350877 0.89912281
0.90350877 0.89912281 0.90829694 0.91266376]
mean value: 0.902011031946679
key: test_roc_auc
value: [0.92230769 0.94 0.92307692 0.90153846 0.90230769 0.90153846
0.90230769 0.90153846 0.88 0.82 ]
mean value: 0.8994615384615384
key: train_roc_auc
value: [0.89715966 0.8971405 0.89936222 0.90152647 0.90153605 0.90370988
0.90153605 0.90152647 0.90611354 0.90829694]
mean value: 0.9017907760668046
key: test_jcc
value: [0.85714286 0.88 0.86206897 0.81481481 0.82142857 0.82758621
0.82142857 0.82758621 0.78571429 0.68965517]
mean value: 0.8187425652253238
key: train_jcc
value: [0.81349206 0.81496063 0.816 0.82142857 0.82071713 0.82329317
0.82071713 0.82 0.82868526 0.83266932]
mean value: 0.8211963282154172
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.95274234 1.77280283 1.89924169 1.68007755 2.21786213 1.90000033
1.98987079 1.87301683 1.61986661 2.00075674]
mean value: 1.8906237840652467
key: score_time
value: [0.01277399 0.01594472 0.01440883 0.01712847 0.0163753 0.01307678
0.0148952 0.02330542 0.01241088 0.01312971]
mean value: 0.015344929695129395
key: test_mcc
value: [0.8459178 0.88289781 0.8459178 0.80904133 0.84307692 0.65224812
0.84307692 0.80431528 0.72057669 0.64051262]
mean value: 0.7887581298524704
key: train_mcc
value: [0.99124722 0.9956331 1. 0.98695627 0.99128503 0.99563319
0.98688041 0.99563319 0.96966347 1. ]
mean value: 0.9912931889984316
key: test_accuracy
value: [0.92156863 0.94117647 0.92156863 0.90196078 0.92156863 0.82352941
0.92156863 0.90196078 0.86 0.82 ]
mean value: 0.8934901960784314
key: train_accuracy
value: [0.99562363 0.99781182 1. 0.99343545 0.99562363 0.99781182
0.99343545 0.99781182 0.98471616 1. ]
mean value: 0.9956269767708522
key: test_fscore
value: [0.92307692 0.93877551 0.92307692 0.89361702 0.92307692 0.81632653
0.92307692 0.90566038 0.8627451 0.81632653]
mean value: 0.8925758760410566
key: train_fscore
value: [0.99563319 0.99782135 1. 0.99340659 0.99559471 0.99781182
0.99343545 0.99781182 0.98488121 1. ]
mean value: 0.9956396136064475
key: test_precision
value: [0.88888889 0.95833333 0.88888889 0.95454545 0.92307692 0.86956522
0.92307692 0.88888889 0.84615385 0.83333333]
mean value: 0.8974751697577784
key: train_precision
value: [0.99563319 0.99565217 1. 1. 1. 0.99563319
0.99126638 0.99563319 0.97435897 1. ]
mean value: 0.9948177087136647
key: test_recall
value: [0.96 0.92 0.96 0.84 0.92307692 0.76923077
0.92307692 0.92307692 0.88 0.8 ]
mean value: 0.8898461538461538
key: train_recall
value: [0.99563319 1. 1. 0.98689956 0.99122807 1.
0.99561404 1. 0.99563319 1. ]
mean value: 0.9965008044127787
key: test_roc_auc
value: [0.92230769 0.94076923 0.92230769 0.90076923 0.92153846 0.82461538
0.92153846 0.90153846 0.86 0.82 ]
mean value: 0.8935384615384615
key: train_roc_auc
value: [0.99562361 0.99780702 1. 0.99344978 0.99561404 0.99781659
0.99344021 0.99781659 0.98471616 1. ]
mean value: 0.9956283996016241
key: test_jcc
value: [0.85714286 0.88461538 0.85714286 0.80769231 0.85714286 0.68965517
0.85714286 0.82758621 0.75862069 0.68965517]
mean value: 0.8086396362258431
key: train_jcc
value: [0.99130435 0.99565217 1. 0.98689956 0.99122807 0.99563319
0.98695652 0.99563319 0.97021277 1. ]
mean value: 0.9913519818475776
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04281473 0.02258086 0.01959491 0.02242303 0.01986456 0.02302599
0.02203774 0.0218904 0.02118134 0.02031946]
mean value: 0.02357330322265625
key: score_time
value: [0.01040149 0.00911903 0.00908256 0.0089097 0.00903273 0.00893998
0.00948405 0.00905752 0.00999212 0.00940347]
mean value: 0.009342265129089356
key: test_mcc
value: [0.96153846 0.76662339 0.92450033 0.64715023 0.96148034 0.92153846
0.96148034 0.96153846 0.88070485 0.76 ]
mean value: 0.8746554854753938
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98039216 0.88235294 0.96078431 0.82352941 0.98039216 0.96078431
0.98039216 0.98039216 0.94 0.88 ]
mean value: 0.9369019607843136
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98039216 0.875 0.96153846 0.81632653 0.98113208 0.96153846
0.98113208 0.98039216 0.93877551 0.88 ]
mean value: 0.9356227428562136
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96153846 0.91304348 0.92592593 0.83333333 0.96296296 0.96153846
0.96296296 1. 0.95833333 0.88 ]
mean value: 0.9359638919856311
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.84 1. 0.8 1. 0.96153846
1. 0.96153846 0.92 0.88 ]
mean value: 0.9363076923076923
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98076923 0.88153846 0.96153846 0.82307692 0.98 0.96076923
0.98 0.98076923 0.94 0.88 ]
mean value: 0.9368461538461539
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96153846 0.77777778 0.92592593 0.68965517 0.96296296 0.92592593
0.96296296 0.96153846 0.88461538 0.78571429]
mean value: 0.8838617321375942
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.9
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12462068 0.12486506 0.12026739 0.12022614 0.12115216 0.11893344
0.11995935 0.11945629 0.11925888 0.11902761]
mean value: 0.12077670097351074
key: score_time
value: [0.01896501 0.01762033 0.01811218 0.01792049 0.01771355 0.01781607
0.01778412 0.01798749 0.01778889 0.01803756]
mean value: 0.01797456741333008
key: test_mcc
value: [0.92153846 0.78581168 0.82041265 0.88289781 0.76733527 0.68779719
0.80904133 0.73107432 0.6821865 0.72057669]
mean value: 0.7808671912348781
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96078431 0.88235294 0.90196078 0.94117647 0.88235294 0.84313725
0.90196078 0.8627451 0.84 0.86 ]
mean value: 0.8876470588235295
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96 0.86363636 0.90909091 0.93877551 0.88 0.85185185
0.90909091 0.85714286 0.84615385 0.8627451 ]
mean value: 0.8878487345210034
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96 1. 0.83333333 0.95833333 0.91666667 0.82142857
0.86206897 0.91304348 0.81481481 0.84615385]
mean value: 0.8925843009508676
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96 0.76 1. 0.92 0.84615385 0.88461538
0.96153846 0.80769231 0.88 0.88 ]
mean value: 0.89
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96076923 0.88 0.90384615 0.94076923 0.88307692 0.84230769
0.90076923 0.86384615 0.84 0.86 ]
mean value: 0.8875384615384615
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92307692 0.76 0.83333333 0.88461538 0.78571429 0.74193548
0.83333333 0.75 0.73333333 0.75862069]
mean value: 0.8003962766932734
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01009369 0.0099442 0.00995088 0.00998259 0.01005554 0.01011753
0.00999284 0.01003075 0.01016021 0.01010418]
mean value: 0.01004323959350586
key: score_time
value: [0.00866437 0.00865102 0.00874305 0.008708 0.00879359 0.00871396
0.00872636 0.00874496 0.00876403 0.00880837]
mean value: 0.008731770515441894
key: test_mcc
value: [0.72984534 0.53444024 0.61648638 0.41306141 0.61017022 0.30559708
0.5301448 0.29366622 0.52167203 0.24174689]
mean value: 0.47968306142454376
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8627451 0.76470588 0.80392157 0.70588235 0.80392157 0.64705882
0.76470588 0.64705882 0.76 0.62 ]
mean value: 0.738
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85106383 0.73913043 0.81481481 0.68085106 0.8 0.60869565
0.77777778 0.66666667 0.76923077 0.64150943]
mean value: 0.7349740443025836
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.80952381 0.75862069 0.72727273 0.83333333 0.7
0.75 0.64285714 0.74074074 0.60714286]
mean value: 0.7478582209616692
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.68 0.88 0.64 0.76923077 0.53846154
0.80769231 0.69230769 0.8 0.68 ]
mean value: 0.7287692307692308
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86153846 0.76307692 0.80538462 0.70461538 0.80461538 0.64923077
0.76384615 0.64615385 0.76 0.62 ]
mean value: 0.7378461538461538
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.74074074 0.5862069 0.6875 0.51612903 0.66666667 0.4375
0.63636364 0.5 0.625 0.47222222]
mean value: 0.5868329194803055
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.84472108 1.87050414 1.77609229 1.76353192 1.76929617 1.75256467
1.76388931 1.7507031 1.74432325 1.72300816]
mean value: 1.7758634090423584
key: score_time
value: [0.10199165 0.10157084 0.09312439 0.09370327 0.0956409 0.09323788
0.1439209 0.09496427 0.09200835 0.09272313]
mean value: 0.10028855800628662
key: test_mcc
value: [0.96148034 0.88823731 0.88872671 0.84307692 1. 0.88289781
0.92427578 0.92450033 0.84 0.88070485]
mean value: 0.9033900045709102
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98039216 0.94117647 0.94117647 0.92156863 1. 0.94117647
0.96078431 0.96078431 0.92 0.94 ]
mean value: 0.9507058823529412
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97959184 0.93617021 0.94339623 0.92 1. 0.94339623
0.96296296 0.96 0.92 0.93877551]
mean value: 0.9504292975497886
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.89285714 0.92 1. 0.92592593
0.92857143 1. 0.92 0.95833333]
mean value: 0.9545687830687831
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96 0.88 1. 0.92 1. 0.96153846
1. 0.92307692 0.92 0.92 ]
mean value: 0.9484615384615385
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98 0.94 0.94230769 0.92153846 1. 0.94076923
0.96 0.96153846 0.92 0.94 ]
mean value: 0.9506153846153846
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96 0.88 0.89285714 0.85185185 1. 0.89285714
0.92857143 0.92307692 0.85185185 0.88461538]
mean value: 0.9065681725681726
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.96350503 0.96648908 0.9615953 0.95503259 0.93873477 0.94843411
0.95473123 0.98758698 0.94644499 0.94424415]
mean value: 0.9566798210144043
key: score_time
value: [0.16573906 0.28320813 0.27402806 0.27604485 0.13133049 0.31606078
0.21691132 0.13033724 0.27074957 0.25981092]
mean value: 0.23242204189300536
key: test_mcc
value: [0.96148034 0.96148034 0.88872671 0.84307692 0.96148034 0.88289781
0.88823731 0.92450033 0.84 0.88070485]
mean value: 0.903258494769246
key: train_mcc
value: [0.9518693 0.94748334 0.95194315 0.96062133 0.93873056 0.95186838
0.95186838 0.94751863 0.95633188 0.96070785]
mean value: 0.9518942794629011
key: test_accuracy
value: [0.98039216 0.98039216 0.94117647 0.92156863 0.98039216 0.94117647
0.94117647 0.96078431 0.92 0.94 ]
mean value: 0.9507058823529412
key: train_accuracy
value: [0.97592998 0.97374179 0.97592998 0.98030635 0.96936543 0.97592998
0.97592998 0.97374179 0.97816594 0.98034934]
mean value: 0.975939055736577
key: test_fscore
value: [0.97959184 0.97959184 0.94339623 0.92 0.98113208 0.94339623
0.94545455 0.96 0.92 0.93877551]
mean value: 0.9511338257429902
key: train_fscore
value: [0.97592998 0.97379913 0.97582418 0.98039216 0.96929825 0.97582418
0.97582418 0.97356828 0.97816594 0.98039216]
mean value: 0.9759018412370725
key: test_precision
value: [1. 1. 0.89285714 0.92 0.96296296 0.92592593
0.89655172 1. 0.92 0.95833333]
mean value: 0.9476631089217297
key: train_precision
value: [0.97807018 0.97379913 0.98230088 0.97826087 0.96929825 0.97797357
0.97797357 0.97787611 0.97816594 0.97826087]
mean value: 0.9771979353399569
key: test_recall
value: [0.96 0.96 1. 0.92 1. 0.96153846
1. 0.92307692 0.92 0.92 ]
mean value: 0.9564615384615385
key: train_recall
value: [0.97379913 0.97379913 0.96943231 0.98253275 0.96929825 0.97368421
0.97368421 0.96929825 0.97816594 0.98253275]
mean value: 0.9746226921014326
key: test_roc_auc
value: [0.98 0.98 0.94230769 0.92153846 0.98 0.94076923
0.94 0.96153846 0.92 0.94 ]
mean value: 0.9506153846153846
key: train_roc_auc
value: [0.97593465 0.97374167 0.97594423 0.98030146 0.96936528 0.97592507
0.97592507 0.97373209 0.97816594 0.98034934]
mean value: 0.9759384815751169
key: test_jcc
value: [0.96 0.96 0.89285714 0.85185185 0.96296296 0.89285714
0.89655172 0.92307692 0.85185185 0.88461538]
mean value: 0.9076624984211191
key: train_jcc
value: [0.95299145 0.94893617 0.9527897 0.96153846 0.94042553 0.9527897
0.9527897 0.94849785 0.95726496 0.96153846]
mean value: 0.9529561988250692
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01163793 0.01159668 0.01061368 0.01021719 0.01039076 0.01039505
0.01055193 0.01151991 0.01037455 0.01054192]
mean value: 0.010783958435058593
key: score_time
value: [0.01002407 0.00981021 0.00904036 0.00957513 0.00927067 0.00931168
0.00938725 0.00996947 0.00909567 0.0091548 ]
mean value: 0.009463930130004882
key: test_mcc
value: [0.88289781 0.62355907 0.77487835 0.68875274 0.72615385 0.72573276
0.68779719 0.61017022 0.60783067 0.72524067]
mean value: 0.705301332855258
key: train_mcc
value: [0.75071367 0.74619319 0.75930821 0.77253746 0.74212413 0.74619319
0.72014338 0.75504732 0.77747792 0.76862491]
mean value: 0.7538363372752251
key: test_accuracy
value: [0.94117647 0.80392157 0.88235294 0.84313725 0.8627451 0.8627451
0.84313725 0.80392157 0.8 0.86 ]
mean value: 0.8503137254901961
key: train_accuracy
value: [0.87527352 0.87308534 0.87964989 0.88621444 0.87089716 0.87308534
0.85995624 0.87746171 0.88864629 0.88427948]
mean value: 0.876854939657726
key: test_fscore
value: [0.93877551 0.77272727 0.88888889 0.84615385 0.8627451 0.86792453
0.85185185 0.8 0.81481481 0.85106383]
mean value: 0.8494945640769093
key: train_fscore
value: [0.87688985 0.87391304 0.87964989 0.88744589 0.86859688 0.8722467
0.85777778 0.87826087 0.88984881 0.88503254]
mean value: 0.8769662245721188
key: test_precision
value: [0.95833333 0.89473684 0.82758621 0.81481481 0.88 0.85185185
0.82142857 0.83333333 0.75862069 0.90909091]
mean value: 0.8549796552509801
key: train_precision
value: [0.86752137 0.87012987 0.88157895 0.87982833 0.88235294 0.87610619
0.86936937 0.87068966 0.88034188 0.87931034]
mean value: 0.8757228896777902
key: test_recall
value: [0.92 0.68 0.96 0.88 0.84615385 0.88461538
0.88461538 0.76923077 0.88 0.8 ]
mean value: 0.8504615384615385
key: train_recall
value: [0.88646288 0.87772926 0.87772926 0.89519651 0.85526316 0.86842105
0.84649123 0.88596491 0.89956332 0.89082969]
mean value: 0.878365126790776
key: test_roc_auc
value: [0.94076923 0.80153846 0.88384615 0.84384615 0.86307692 0.86230769
0.84230769 0.80461538 0.8 0.86 ]
mean value: 0.8502307692307692
key: train_roc_auc
value: [0.87524898 0.87307516 0.8796541 0.88619474 0.87086302 0.87307516
0.85992684 0.87748027 0.88864629 0.88427948]
mean value: 0.8768444035853827
key: test_jcc
value: [0.88461538 0.62962963 0.8 0.73333333 0.75862069 0.76666667
0.74193548 0.66666667 0.6875 0.74074074]
mean value: 0.7409708595178561
key: train_jcc
value: [0.78076923 0.77606178 0.78515625 0.79766537 0.76771654 0.7734375
0.75097276 0.78294574 0.80155642 0.79377432]
mean value: 0.7810055900293517
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09083033 0.08632159 0.07055283 0.07104707 0.07519531 0.07939005
0.07854795 0.09603834 0.08671165 0.07549286]
mean value: 0.08101279735565185
key: score_time
value: [0.01125097 0.01128316 0.01079941 0.01076365 0.01111865 0.01134443
0.01275802 0.01227999 0.01084495 0.01092196]
mean value: 0.011336517333984376
key: test_mcc
value: [1. 1. 0.84307692 0.84307692 0.92153846 0.96153846
0.96148034 1. 0.88070485 0.76 ]
mean value: 0.9171415955282479
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.92156863 0.92156863 0.96078431 0.98039216
0.98039216 1. 0.94 0.88 ]
mean value: 0.9584705882352941
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.92 0.92 0.96153846 0.98039216
0.98113208 1. 0.93877551 0.88 ]
mean value: 0.9581838204076987
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.92 0.92 0.96153846 1.
0.96296296 1. 0.95833333 0.88 ]
mean value: 0.9602834757834758
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.92 0.92 0.96153846 0.96153846
1. 1. 0.92 0.88 ]
mean value: 0.9563076923076923
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.92153846 0.92153846 0.96076923 0.98076923
0.98 1. 0.94 0.88 ]
mean value: 0.9584615384615385
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.85185185 0.85185185 0.92592593 0.96153846
0.96296296 1. 0.88461538 0.78571429]
mean value: 0.9224460724460725
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06496072 0.06503797 0.07527375 0.07898951 0.1082499 0.08013272
0.07932186 0.04319263 0.06920362 0.0507133 ]
mean value: 0.07150759696960449
key: score_time
value: [0.01696396 0.01866412 0.01862216 0.01909184 0.03613448 0.01893544
0.01222348 0.01733971 0.01222539 0.01219845]
mean value: 0.018239903450012206
key: test_mcc
value: [0.88307692 0.80904133 0.80990051 0.76662339 0.80461538 0.72573276
0.72984534 0.76662339 0.76 0.60192927]
mean value: 0.765738828717061
key: train_mcc
value: [0.90817148 0.90830894 0.92560955 0.90426654 0.89956325 0.91693003
0.90386163 0.91250886 0.91710927 0.91703931]
mean value: 0.9113368869349576
key: test_accuracy
value: [0.94117647 0.90196078 0.90196078 0.88235294 0.90196078 0.8627451
0.8627451 0.88235294 0.88 0.8 ]
mean value: 0.8817254901960785
key: train_accuracy
value: [0.95404814 0.95404814 0.96280088 0.95185996 0.94967177 0.95842451
0.95185996 0.95623632 0.95851528 0.95851528]
mean value: 0.9555980239458018
key: test_fscore
value: [0.94117647 0.89361702 0.90566038 0.875 0.90196078 0.86792453
0.87272727 0.88888889 0.88 0.80769231]
mean value: 0.8834647651147403
key: train_fscore
value: [0.95444685 0.95464363 0.96296296 0.9527897 0.95010846 0.95860566
0.95217391 0.95633188 0.95878525 0.95860566]
mean value: 0.9559453974783592
key: test_precision
value: [0.92307692 0.95454545 0.85714286 0.91304348 0.92 0.85185185
0.82758621 0.85714286 0.88 0.77777778]
mean value: 0.8762167406695143
key: train_precision
value: [0.94827586 0.94444444 0.96086957 0.93670886 0.93991416 0.95238095
0.94396552 0.95217391 0.95258621 0.95652174]
mean value: 0.948784122427322
key: test_recall
value: [0.96 0.84 0.96 0.84 0.88461538 0.88461538
0.92307692 0.92307692 0.88 0.84 ]
mean value: 0.8935384615384615
key: train_recall
value: [0.96069869 0.9650655 0.9650655 0.96943231 0.96052632 0.96491228
0.96052632 0.96052632 0.9650655 0.96069869]
mean value: 0.9632517428943538
key: test_roc_auc
value: [0.94153846 0.90076923 0.90307692 0.88153846 0.90230769 0.86230769
0.86153846 0.88153846 0.88 0.8 ]
mean value: 0.8814615384615384
key: train_roc_auc
value: [0.95403356 0.95402398 0.96279591 0.95182142 0.94969547 0.95843867
0.95187888 0.95624569 0.95851528 0.95851528]
mean value: 0.9555964146173294
key: test_jcc
value: [0.88888889 0.80769231 0.82758621 0.77777778 0.82142857 0.76666667
0.77419355 0.8 0.78571429 0.67741935]
mean value: 0.7927367608290856
key: train_jcc
value: [0.91286307 0.91322314 0.92857143 0.90983607 0.90495868 0.92050209
0.90871369 0.91631799 0.92083333 0.92050209]
mean value: 0.9156321584878045
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01775408 0.01069021 0.01090527 0.01043916 0.01051974 0.01007056
0.00997186 0.01019859 0.01107121 0.01059556]
mean value: 0.011221623420715332
key: score_time
value: [0.01215219 0.00924611 0.00937891 0.00882244 0.00891948 0.0093646
0.00921845 0.00940609 0.00914979 0.00949597]
mean value: 0.00951540470123291
key: test_mcc
value: [0.80431528 0.85322916 0.76733527 0.68615385 0.72615385 0.72573276
0.72573276 0.61648638 0.72057669 0.68887476]
mean value: 0.73145907619674
key: train_mcc
value: [0.72878597 0.70713347 0.73311115 0.77687755 0.68978499 0.7418496
0.7166604 0.71585171 0.76445458 0.74270515]
mean value: 0.7317214561206101
key: test_accuracy
value: [0.90196078 0.92156863 0.88235294 0.84313725 0.8627451 0.8627451
0.8627451 0.80392157 0.86 0.84 ]
mean value: 0.8641176470588235
key: train_accuracy
value: [0.8643326 0.85339168 0.86652079 0.88840263 0.84463895 0.87089716
0.85776805 0.85776805 0.88209607 0.87117904]
mean value: 0.8656995021642954
key: test_fscore
value: [0.89795918 0.91304348 0.88461538 0.84 0.8627451 0.86792453
0.86792453 0.79166667 0.8627451 0.82608696]
mean value: 0.8614710922420334
key: train_fscore
value: [0.86343612 0.85144124 0.86593407 0.88791209 0.84116331 0.86975717
0.85327314 0.85523385 0.88053097 0.8691796 ]
mean value: 0.8637861569276664
key: test_precision
value: [0.91666667 1. 0.85185185 0.84 0.88 0.85185185
0.85185185 0.86363636 0.84615385 0.9047619 ]
mean value: 0.8806774336774337
key: train_precision
value: [0.87111111 0.86486486 0.87168142 0.89380531 0.85844749 0.87555556
0.87906977 0.86877828 0.89237668 0.88288288]
mean value: 0.8758573358261803
key: test_recall
value: [0.88 0.84 0.92 0.84 0.84615385 0.88461538
0.88461538 0.73076923 0.88 0.76 ]
mean value: 0.8466153846153845
key: train_recall
value: [0.8558952 0.83842795 0.86026201 0.88209607 0.8245614 0.86403509
0.82894737 0.84210526 0.86899563 0.8558952 ]
mean value: 0.8521221175208764
key: test_roc_auc
value: [0.90153846 0.92 0.88307692 0.84307692 0.86307692 0.86230769
0.86230769 0.80538462 0.86 0.84 ]
mean value: 0.8640769230769231
key: train_roc_auc
value: [0.86435111 0.8534245 0.86653451 0.88841646 0.84459511 0.87088217
0.85770513 0.85773385 0.88209607 0.87117904]
mean value: 0.8656917949896575
key: test_jcc
value: [0.81481481 0.84 0.79310345 0.72413793 0.75862069 0.76666667
0.76666667 0.65517241 0.75862069 0.7037037 ]
mean value: 0.7581507024265645
key: train_jcc
value: [0.75968992 0.74131274 0.76356589 0.79841897 0.72586873 0.76953125
0.74409449 0.74708171 0.78656126 0.76862745]
mean value: 0.7604752419520732
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01461196 0.01674128 0.02004933 0.01781726 0.01789832 0.02177954
0.01803398 0.0175209 0.01656842 0.02079844]
mean value: 0.01818194389343262
key: score_time
value: [0.01018262 0.01185966 0.01840734 0.01195478 0.01202202 0.01203704
0.01195407 0.01193404 0.01192832 0.01798701]
mean value: 0.013026690483093262
key: test_mcc
value: [0.80990051 0.81912621 0.82041265 0.76733527 0.78581168 0.73107432
0.80431528 0.78762135 0.76 0.6821865 ]
mean value: 0.7767783786155215
key: train_mcc
value: [0.80373177 0.8497961 0.8591878 0.81151328 0.70283343 0.88786716
0.85514592 0.74595689 0.84755764 0.83552208]
mean value: 0.8199112066271259
key: test_accuracy
value: [0.90196078 0.90196078 0.90196078 0.88235294 0.88235294 0.8627451
0.90196078 0.88235294 0.88 0.84 ]
mean value: 0.8837647058823529
key: train_accuracy
value: [0.89715536 0.92341357 0.92778993 0.90153173 0.83588621 0.94310722
0.92560175 0.85995624 0.92358079 0.91484716]
mean value: 0.9052869960727357
key: test_fscore
value: [0.90566038 0.88888889 0.90909091 0.88461538 0.89655172 0.85714286
0.90566038 0.86956522 0.88 0.84615385]
mean value: 0.8843329582138102
key: train_fscore
value: [0.90466531 0.92027335 0.93110647 0.90835031 0.85659656 0.94117647
0.92165899 0.83838384 0.92239468 0.91958763]
mean value: 0.9064193601059057
key: test_precision
value: [0.85714286 1. 0.83333333 0.85185185 0.8125 0.91304348
0.88888889 1. 0.88 0.81481481]
mean value: 0.8851575224292616
key: train_precision
value: [0.84469697 0.96190476 0.892 0.85114504 0.75932203 0.97196262
0.97087379 0.98809524 0.93693694 0.87109375]
mean value: 0.9048031131930347
key: test_recall
value: [0.96 0.8 1. 0.92 1. 0.80769231
0.92307692 0.76923077 0.88 0.88 ]
mean value: 0.894
key: train_recall
value: [0.97379913 0.88209607 0.97379913 0.97379913 0.98245614 0.9122807
0.87719298 0.72807018 0.90829694 0.97379913]
mean value: 0.9185589519650655
key: test_roc_auc
value: [0.90307692 0.9 0.90384615 0.88307692 0.88 0.86384615
0.90153846 0.88461538 0.88 0.84 ]
mean value: 0.884
key: train_roc_auc
value: [0.89698728 0.92350418 0.92768904 0.90137325 0.83620624 0.94303991
0.92549605 0.85966828 0.92358079 0.91484716]
mean value: 0.9052392170382287
key: test_jcc
value: [0.82758621 0.8 0.83333333 0.79310345 0.8125 0.75
0.82758621 0.76923077 0.78571429 0.73333333]
mean value: 0.7932387583680687
key: train_jcc
value: [0.82592593 0.85232068 0.87109375 0.83208955 0.74916388 0.88888889
0.85470085 0.72173913 0.85596708 0.85114504]
mean value: 0.8303034773250645
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0145123 0.02034068 0.02262974 0.02118397 0.01858521 0.01969814
0.02044988 0.02166414 0.02026725 0.01864958]
mean value: 0.019798088073730468
key: score_time
value: [0.01100039 0.01206136 0.01206374 0.01205349 0.01194215 0.0121634
0.01195788 0.01196051 0.0119977 0.01203704]
mean value: 0.011923766136169434
key: test_mcc
value: [0.80990051 0.84544958 0.85407434 0.73878883 0.80461538 0.61413747
0.84544958 0.88872671 0.76244374 0.64465837]
mean value: 0.7808244519509732
key: train_mcc
value: [0.87909672 0.86883646 0.90426654 0.76612696 0.85658732 0.8392754
0.89935264 0.87549121 0.88352087 0.82858789]
mean value: 0.8601142007805195
key: test_accuracy
value: [0.90196078 0.92156863 0.92156863 0.8627451 0.90196078 0.80392157
0.92156863 0.94117647 0.88 0.82 ]
mean value: 0.8876470588235295
key: train_accuracy
value: [0.93873085 0.93435449 0.95185996 0.87089716 0.92778993 0.91466083
0.94967177 0.93654267 0.94104803 0.91048035]
mean value: 0.9276036042922802
key: test_fscore
value: [0.90566038 0.91666667 0.92592593 0.84444444 0.90196078 0.82142857
0.92592593 0.93877551 0.88461538 0.83018868]
mean value: 0.88955922701285
key: train_fscore
value: [0.94067797 0.93506494 0.9527897 0.85286783 0.92933619 0.92057026
0.94967177 0.93394077 0.94267516 0.91615542]
mean value: 0.9273750009738929
key: test_precision
value: [0.85714286 0.95652174 0.86206897 0.95 0.92 0.76666667
0.89285714 1. 0.85185185 0.78571429]
mean value: 0.8842823508880481
key: train_precision
value: [0.91358025 0.92703863 0.93670886 0.99418605 0.90794979 0.85931559
0.94759825 0.97156398 0.91735537 0.86153846]
mean value: 0.9236835228699787
key: test_recall
value: [0.96 0.88 1. 0.76 0.88461538 0.88461538
0.96153846 0.88461538 0.92 0.88 ]
mean value: 0.9015384615384615
key: train_recall
value: [0.96943231 0.94323144 0.96943231 0.74672489 0.95175439 0.99122807
0.95175439 0.89912281 0.96943231 0.97816594]
mean value: 0.9370278863096606
key: test_roc_auc
value: [0.90307692 0.92076923 0.92307692 0.86076923 0.90230769 0.80230769
0.92076923 0.94230769 0.88 0.82 ]
mean value: 0.8875384615384615
key: train_roc_auc
value: [0.93866353 0.93433502 0.95182142 0.87116946 0.92784226 0.91482801
0.94967632 0.93646097 0.94104803 0.91048035]
mean value: 0.9276325365816287
key: test_jcc
value: [0.82758621 0.84615385 0.86206897 0.73076923 0.82142857 0.6969697
0.86206897 0.88461538 0.79310345 0.70967742]
mean value: 0.8034441735498465
key: train_jcc
value: [0.888 0.87804878 0.90983607 0.74347826 0.868 0.85283019
0.90416667 0.87606838 0.89156627 0.84528302]
mean value: 0.8657277622273594
MCC on Blind test: 0.75
Accuracy on Blind test: 0.86
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.20065451 0.18761659 0.18804526 0.18726611 0.18589187 0.18081141
0.18000293 0.18104124 0.18397188 0.17888808]
mean value: 0.18541898727416992
key: score_time
value: [0.01699638 0.0164814 0.01690269 0.01632905 0.01682019 0.01586986
0.01592231 0.0159359 0.01681352 0.01568103]
mean value: 0.016375231742858886
key: test_mcc
value: [0.96153846 1. 0.88307692 0.84544958 0.96148034 0.92450033
0.96148034 1. 0.92 0.80064077]
mean value: 0.9258166744058197
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98039216 1. 0.94117647 0.92156863 0.98039216 0.96078431
0.98039216 1. 0.96 0.9 ]
mean value: 0.9624705882352941
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98039216 1. 0.94117647 0.91666667 0.98113208 0.96
0.98113208 1. 0.96 0.90196078]
mean value: 0.9622460229374769
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96153846 1. 0.92307692 0.95652174 0.96296296 1.
0.96296296 1. 0.96 0.88461538]
mean value: 0.961167843428713
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.96 0.88 1. 0.92307692
1. 1. 0.96 0.92 ]
mean value: 0.9643076923076923
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98076923 1. 0.94153846 0.92076923 0.98 0.96153846
0.98 1. 0.96 0.9 ]
mean value: 0.9624615384615385
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96153846 1. 0.88888889 0.84615385 0.96296296 0.92307692
0.96296296 1. 0.92307692 0.82142857]
mean value: 0.929008954008954
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06497097 0.07499814 0.08114505 0.0727787 0.07150054 0.08404374
0.08080506 0.05995893 0.06555724 0.06107831]
mean value: 0.07168366909027099
key: score_time
value: [0.01859593 0.03967738 0.02885842 0.04054928 0.02583647 0.03430438
0.03502369 0.02353096 0.0248487 0.02320862]
mean value: 0.02944338321685791
key: test_mcc
value: [1. 0.80904133 0.88872671 0.76662339 1. 0.96148034
0.96148034 1. 0.92 0.80064077]
mean value: 0.9107992872362165
key: train_mcc
value: [0.99124722 0.98688016 0.9956331 0.99128503 0.97812763 0.97812763
0.98249445 0.98695553 0.99126638 0.99126638]
mean value: 0.9873283508508414
key: test_accuracy
value: [1. 0.90196078 0.94117647 0.88235294 1. 0.98039216
0.98039216 1. 0.96 0.9 ]
mean value: 0.9546274509803921
key: train_accuracy
value: [0.99562363 0.99343545 0.99781182 0.99562363 0.98905908 0.98905908
0.99124726 0.99343545 0.99563319 0.99563319]
mean value: 0.9936561780359856
key: test_fscore
value: [1. 0.89361702 0.94339623 0.875 1. 0.98113208
0.98113208 1. 0.96 0.90196078]
mean value: 0.9536238182948812
key: train_fscore
value: [0.99563319 0.99346405 0.99782135 0.99565217 0.98905908 0.98905908
0.99122807 0.99337748 0.99563319 0.99563319]
mean value: 0.9936560855826678
key: test_precision
value: [1. 0.95454545 0.89285714 0.91304348 1. 0.96296296
0.96296296 1. 0.96 0.88461538]
mean value: 0.9530987386204778
key: train_precision
value: [0.99563319 0.99130435 0.99565217 0.99134199 0.98689956 0.98689956
0.99122807 1. 0.99563319 0.99563319]
mean value: 0.9930225273212893
key: test_recall
value: [1. 0.84 1. 0.84 1. 1. 1. 1. 0.96 0.92]
mean value: 0.956
key: train_recall
value: [0.99563319 0.99563319 1. 1. 0.99122807 0.99122807
0.99122807 0.98684211 0.99563319 0.99563319]
mean value: 0.9943059066881177
key: test_roc_auc
value: [1. 0.90076923 0.94230769 0.88153846 1. 0.98
0.98 1. 0.96 0.9 ]
mean value: 0.9544615384615385
key: train_roc_auc
value: [0.99562361 0.99343063 0.99780702 0.99561404 0.98906382 0.98906382
0.99124722 0.99342105 0.99563319 0.99563319]
mean value: 0.9936537577568375
key: test_jcc
value: [1. 0.80769231 0.89285714 0.77777778 1. 0.96296296
0.96296296 1. 0.92307692 0.82142857]
mean value: 0.9148758648758649
key: train_jcc
value: [0.99130435 0.98701299 0.99565217 0.99134199 0.97835498 0.97835498
0.9826087 0.98684211 0.99130435 0.99130435]
mean value: 0.9874080953371571
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.1736908 0.16033888 0.1746788 0.15446329 0.17124557 0.15994787
0.13885593 0.10975528 0.1811161 0.09505463]
mean value: 0.15191471576690674
key: score_time
value: [0.02411532 0.02435613 0.01529169 0.02446175 0.02488351 0.02440786
0.01523852 0.02894855 0.02414536 0.0149796 ]
mean value: 0.022082829475402833
key: test_mcc
value: [0.80904133 0.60498161 0.75558816 0.76662339 0.69568237 0.52923077
0.76662339 0.52923077 0.56044854 0.60192927]
mean value: 0.6619379576424396
key: train_mcc
value: [0.99128536 0.98695627 0.98695627 0.98695627 0.98695553 0.99128503
0.98695553 0.98695553 0.98698426 0.99130418]
mean value: 0.9882594241358311
key: test_accuracy
value: [0.90196078 0.78431373 0.8627451 0.88235294 0.84313725 0.76470588
0.88235294 0.76470588 0.78 0.8 ]
mean value: 0.8266274509803921
key: train_accuracy
value: [0.99562363 0.99343545 0.99343545 0.99343545 0.99343545 0.99562363
0.99343545 0.99343545 0.99344978 0.99563319]
mean value: 0.9940942925668638
key: test_fscore
value: [0.89361702 0.73170732 0.87719298 0.875 0.83333333 0.76923077
0.88888889 0.76923077 0.78431373 0.79166667]
mean value: 0.821418147364653
key: train_fscore
value: [0.99561404 0.99340659 0.99340659 0.99340659 0.99337748 0.99559471
0.99337748 0.99337748 0.99340659 0.99561404]
mean value: 0.9940581607789326
key: test_precision
value: [0.95454545 0.9375 0.78125 0.91304348 0.90909091 0.76923077
0.85714286 0.76923077 0.76923077 0.82608696]
mean value: 0.8486351963254137
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.84 0.6 1. 0.84 0.76923077 0.76923077
0.92307692 0.76923077 0.8 0.76 ]
mean value: 0.8070769230769231
key: train_recall
value: [0.99126638 0.98689956 0.98689956 0.98689956 0.98684211 0.99122807
0.98684211 0.98684211 0.98689956 0.99126638]
mean value: 0.9881885390331724
key: test_roc_auc
value: [0.90076923 0.78076923 0.86538462 0.88153846 0.84461538 0.76461538
0.88153846 0.76461538 0.78 0.8 ]
mean value: 0.8263846153846154
key: train_roc_auc
value: [0.99563319 0.99344978 0.99344978 0.99344978 0.99342105 0.99561404
0.99342105 0.99342105 0.99344978 0.99563319]
mean value: 0.9940942695165862
key: test_jcc
value: [0.80769231 0.57692308 0.78125 0.77777778 0.71428571 0.625
0.8 0.625 0.64516129 0.65517241]
mean value: 0.7008262580794561
key: train_jcc
value: [0.99126638 0.98689956 0.98689956 0.98689956 0.98684211 0.99122807
0.98684211 0.98684211 0.98689956 0.99126638]
mean value: 0.9881885390331724
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.72235799 0.70944214 0.70883203 0.71152782 0.71902704 0.71618891
0.7211957 0.71571803 0.71437287 0.70587111]
mean value: 0.7144533634185791
key: score_time
value: [0.00964093 0.00959468 0.00950909 0.00943208 0.01016808 0.00956535
0.01000834 0.00979233 0.00962782 0.00996852]
mean value: 0.009730720520019531
key: test_mcc
value: [1. 0.85322916 0.88872671 0.84307692 0.92427578 0.96148034
1. 1. 0.88070485 0.80064077]
mean value: 0.9152134526595563
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.92156863 0.94117647 0.92156863 0.96078431 0.98039216
1. 1. 0.94 0.9 ]
mean value: 0.9565490196078431
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.91304348 0.94339623 0.92 0.96296296 0.98113208
1. 1. 0.93877551 0.90196078]
mean value: 0.9561271037628433
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.89285714 0.92 0.92857143 0.96296296
1. 1. 0.95833333 0.88461538]
mean value: 0.9547340252340253
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.84 1. 0.92 1. 1. 1. 1. 0.92 0.92]
mean value: 0.96
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92 0.94230769 0.92153846 0.96 0.98
1. 1. 0.94 0.9 ]
mean value: 0.9563846153846154
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.84 0.89285714 0.85185185 0.92857143 0.96296296
1. 1. 0.88461538 0.82142857]
mean value: 0.9182287342287342
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03209758 0.04663944 0.04156089 0.03064513 0.03051949 0.03812504
0.03040957 0.03081679 0.03012967 0.03015852]
mean value: 0.034110212326049806
key: score_time
value: [0.01268816 0.01741695 0.01369047 0.01662827 0.01425338 0.01279664
0.01463175 0.01472878 0.01473212 0.0146606 ]
mean value: 0.014622712135314941
key: test_mcc
value: [ 0.50162374 0.33282012 0.4779765 0.38593446 0.38074981 -0.08910647
0.67109832 0.41306141 0.52678658 0.42874646]
mean value: 0.40296909364640227
key: train_mcc
value: [0.89198163 0.97407901 0.94806064 0.63869807 0.73237152 0.51794578
0.96944796 0.93638281 0.93650904 0.94475499]
mean value: 0.8490231449196334
key: test_accuracy
value: [0.74509804 0.66666667 0.7254902 0.68627451 0.66666667 0.47058824
0.82352941 0.70588235 0.76 0.7 ]
mean value: 0.6950196078431372
key: train_accuracy
value: [0.94310722 0.9868709 0.97374179 0.78993435 0.84901532 0.71115974
0.98468271 0.96717724 0.96724891 0.97161572]
mean value: 0.9144553906720304
key: test_fscore
value: [0.76363636 0.65306122 0.75862069 0.71428571 0.73846154 0.59701493
0.84745763 0.72727273 0.77777778 0.74576271]
mean value: 0.7323351299935275
key: train_fscore
value: [0.94628099 0.98672566 0.97424893 0.8267148 0.86857143 0.7755102
0.98454746 0.96815287 0.96828753 0.97239915]
mean value: 0.9271439021368936
key: test_precision
value: [0.7 0.66666667 0.66666667 0.64516129 0.61538462 0.48780488
0.75757576 0.68965517 0.72413793 0.64705882]
mean value: 0.6600111801642755
key: train_precision
value: [0.89803922 1. 0.95780591 0.70461538 0.76767677 0.63333333
0.99111111 0.9382716 0.93852459 0.94628099]
mean value: 0.877565890643361
key: test_recall
value: [0.84 0.64 0.88 0.8 0.92307692 0.76923077
0.96153846 0.76923077 0.84 0.88 ]
mean value: 0.8303076923076923
key: train_recall
value: [1. 0.97379913 0.99126638 1. 1. 1.
0.97807018 1. 1. 1. ]
mean value: 0.9943135677622003
key: test_roc_auc
value: [0.74692308 0.66615385 0.72846154 0.68846154 0.66153846 0.46461538
0.82076923 0.70461538 0.76 0.7 ]
mean value: 0.6941538461538461
key: train_roc_auc
value: [0.94298246 0.98689956 0.97370336 0.78947368 0.84934498 0.71179039
0.98466828 0.96724891 0.96724891 0.97161572]
mean value: 0.9144976250670344
key: test_jcc
value: [0.61764706 0.48484848 0.61111111 0.55555556 0.58536585 0.42553191
0.73529412 0.57142857 0.63636364 0.59459459]
mean value: 0.5817740898924696
key: train_jcc
value: [0.89803922 0.97379913 0.94979079 0.70461538 0.76767677 0.63333333
0.96956522 0.9382716 0.93852459 0.94628099]
mean value: 0.8719897027157442
MCC on Blind test: 0.45
Accuracy on Blind test: 0.69
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02209425 0.01628399 0.02629995 0.04093146 0.03840542 0.03341603
0.021245 0.02528167 0.02654409 0.03870654]
mean value: 0.028920841217041016
key: score_time
value: [0.02987242 0.01222944 0.01866579 0.01888323 0.01911664 0.01218963
0.01224184 0.01220989 0.01884508 0.01884842]
mean value: 0.017310237884521483
key: test_mcc
value: [0.8459178 0.92427578 0.85407434 0.84544958 0.80461538 0.80431528
0.84307692 0.80904133 0.76 0.68 ]
mean value: 0.8170766413736327
key: train_mcc
value: [0.86024417 0.8425731 0.86433893 0.8559713 0.84690379 0.87309431
0.84274962 0.86454544 0.8735707 0.86470302]
mean value: 0.8588694380310397
key: test_accuracy
value: [0.92156863 0.96078431 0.92156863 0.92156863 0.90196078 0.90196078
0.92156863 0.90196078 0.88 0.84 ]
mean value: 0.9072941176470588
key: train_accuracy
value: [0.92997812 0.92122538 0.9321663 0.92778993 0.92341357 0.93654267
0.92122538 0.9321663 0.93668122 0.93231441]
mean value: 0.9293503291831099
key: test_fscore
value: [0.92307692 0.95833333 0.92592593 0.91666667 0.90196078 0.90566038
0.92307692 0.90909091 0.88 0.84 ]
mean value: 0.9083791842842898
key: train_fscore
value: [0.93103448 0.92207792 0.93246187 0.92903226 0.92374728 0.93654267
0.92207792 0.93275488 0.93736501 0.93275488]
mean value: 0.9299849177077446
key: test_precision
value: [0.88888889 1. 0.86206897 0.95652174 0.92 0.88888889
0.92307692 0.86206897 0.88 0.84 ]
mean value: 0.9021514371019619
key: train_precision
value: [0.91914894 0.91416309 0.93043478 0.91525424 0.91774892 0.93449782
0.91025641 0.92274678 0.92735043 0.92672414]
mean value: 0.9218325537192356
key: test_recall
value: [0.96 0.92 1. 0.88 0.88461538 0.92307692
0.92307692 0.96153846 0.88 0.84 ]
mean value: 0.9172307692307693
key: train_recall
value: [0.94323144 0.930131 0.93449782 0.94323144 0.92982456 0.93859649
0.93421053 0.94298246 0.94759825 0.93886463]
mean value: 0.9383168620240557
key: test_roc_auc
value: [0.92230769 0.96 0.92307692 0.92076923 0.90230769 0.90153846
0.92153846 0.90076923 0.88 0.84 ]
mean value: 0.9072307692307692
key: train_roc_auc
value: [0.92994905 0.92120585 0.93216119 0.92775607 0.92342756 0.93654715
0.92125373 0.93218992 0.93668122 0.93231441]
mean value: 0.929348617176128
key: test_jcc
value: [0.85714286 0.92 0.86206897 0.84615385 0.82142857 0.82758621
0.85714286 0.83333333 0.78571429 0.72413793]
mean value: 0.8334708854364027
key: train_jcc
value: [0.87096774 0.85542169 0.87346939 0.86746988 0.8582996 0.88065844
0.85542169 0.87398374 0.88211382 0.87398374]
mean value: 0.8691789714871334
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.21736836 0.28004289 0.27122521 0.16093874 0.27278447 0.15021992
0.3257041 0.41062498 0.28041005 0.27491283]
mean value: 0.26442315578460696
key: score_time
value: [0.01900768 0.01911998 0.01891971 0.01225471 0.01899242 0.01237059
0.02265048 0.02372003 0.01925325 0.02483606]
mean value: 0.019112491607666017
key: test_mcc
value: [0.8459178 0.92427578 0.80990051 0.84544958 0.80461538 0.80431528
0.84307692 0.80904133 0.76 0.68 ]
mean value: 0.8126592592815737
key: train_mcc
value: [0.86024417 0.8425731 0.91250886 0.8559713 0.84690379 0.87309431
0.84274962 0.86454544 0.8735707 0.86470302]
mean value: 0.8636864306196921
key: test_accuracy
value: [0.92156863 0.96078431 0.90196078 0.92156863 0.90196078 0.90196078
0.92156863 0.90196078 0.88 0.84 ]
mean value: 0.9053333333333333
key: train_accuracy
value: [0.92997812 0.92122538 0.95623632 0.92778993 0.92341357 0.93654267
0.92122538 0.9321663 0.93668122 0.93231441]
mean value: 0.9317573313712937
key: test_fscore
value: [0.92307692 0.95833333 0.90566038 0.91666667 0.90196078 0.90566038
0.92307692 0.90909091 0.88 0.84 ]
mean value: 0.9063526294275461
key: train_fscore
value:/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93103448 0.92207792 0.95614035 0.92903226 0.92374728 0.93654267
0.92207792 0.93275488 0.93736501 0.93275488]
mean value: 0.9323527654316295
key: test_precision
value: [0.88888889 1. 0.85714286 0.95652174 0.92 0.88888889
0.92307692 0.86206897 0.88 0.84 ]
mean value: 0.9016588262645234
key: train_precision
value: [0.91914894 0.91416309 0.96035242 0.91525424 0.91774892 0.93449782
0.91025641 0.92274678 0.92735043 0.92672414]
mean value: 0.9248243177491149
key: test_recall
value: [0.96 0.92 0.96 0.88 0.88461538 0.92307692
0.92307692 0.96153846 0.88 0.84 ]
mean value: 0.9132307692307693
key: train_recall
value: [0.94323144 0.930131 0.95196507 0.94323144 0.92982456 0.93859649
0.93421053 0.94298246 0.94759825 0.93886463]
mean value: 0.9400635869148855
key: test_roc_auc
value: [0.92230769 0.96 0.90307692 0.92076923 0.90230769 0.90153846
0.92153846 0.90076923 0.88 0.84 ]
mean value: 0.9052307692307692
key: train_roc_auc
value: [0.92994905 0.92120585 0.95624569 0.92775607 0.92342756 0.93654715
0.92125373 0.93218992 0.93668122 0.93231441]
mean value: 0.9317570673408412
key: test_jcc
value: [0.85714286 0.92 0.82758621 0.84615385 0.82142857 0.82758621
0.85714286 0.83333333 0.78571429 0.72413793]
mean value: 0.8300226095743337
key: train_jcc
value: [0.87096774 0.85542169 0.91596639 0.86746988 0.8582996 0.88065844
0.85542169 0.87398374 0.88211382 0.87398374]
mean value: 0.8734286713670855
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03605151 0.03672171 0.03721905 0.03759956 0.03047872 0.03578138
0.03592539 0.036762 0.03830004 0.03602743]
mean value: 0.03608667850494385
key: score_time
value: [0.0120666 0.01405454 0.01406908 0.01425576 0.01207781 0.01208544
0.01441956 0.01217318 0.01456451 0.01270866]
mean value: 0.013247513771057129
key: test_mcc
value: [0.92704716 0.88746439 0.57735027 0.84866842 0.79056942 0.74466871
0.84615385 0.84615385 0.77849894 0.84866842]
mean value: 0.8095243434058762
key: train_mcc
value: [0.86365953 0.86787786 0.88530679 0.85535013 0.86386107 0.87262489
0.87284634 0.86395495 0.86815585 0.86386107]
mean value: 0.867749849139245
key: test_accuracy
value: [0.96226415 0.94339623 0.78846154 0.92307692 0.88461538 0.86538462
0.92307692 0.92307692 0.88461538 0.92307692]
mean value: 0.9021044992743106
key: train_accuracy
value: [0.93176972 0.93390192 0.94255319 0.92765957 0.93191489 0.93617021
0.93617021 0.93191489 0.93404255 0.93191489]
mean value: 0.9338012067322959
key: test_fscore
value: [0.96 0.94339623 0.79245283 0.92592593 0.89655172 0.85106383
0.92307692 0.92307692 0.89285714 0.92592593]
mean value: 0.9034327451391779
key: train_fscore
value: [0.93248945 0.93418259 0.94315789 0.9279661 0.93220339 0.93697479
0.93723849 0.93248945 0.93446089 0.93220339]
mean value: 0.9343366440868982
key: test_precision
value: [1. 0.96153846 0.77777778 0.89285714 0.8125 0.95238095
0.92307692 0.92307692 0.83333333 0.89285714]
mean value: 0.8969398656898657
key: train_precision
value: [0.92468619 0.92827004 0.93333333 0.92405063 0.92827004 0.9253112
0.9218107 0.92468619 0.92857143 0.92827004]
mean value: 0.9267259809243651
key: test_recall
value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.76923077
0.92307692 0.92307692 0.96153846 0.96153846]
mean value: 0.9156695156695157
key: train_recall
value: [0.94042553 0.94017094 0.95319149 0.93191489 0.93617021 0.94893617
0.95319149 0.94042553 0.94042553 0.93617021]
mean value: 0.9421022004000728
key: test_roc_auc
value: [0.96153846 0.94373219 0.78846154 0.92307692 0.88461538 0.86538462
0.92307692 0.92307692 0.88461538 0.92307692]
mean value: 0.9020655270655271
key: train_roc_auc
value: [0.93175123 0.93391526 0.94255319 0.92765957 0.93191489 0.93617021
0.93617021 0.93191489 0.93404255 0.93191489]
mean value: 0.9338006910347336
key: test_jcc
value: [0.92307692 0.89285714 0.65625 0.86206897 0.8125 0.74074074
0.85714286 0.85714286 0.80645161 0.86206897]
mean value: 0.8270300064898229
key: train_jcc
value: [0.87351779 0.87649402 0.89243028 0.86561265 0.87301587 0.88142292
0.88188976 0.87351779 0.87698413 0.87301587]
mean value: 0.8767901085829304
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.94098043 0.98036623 0.98186111 0.90004635 1.07260489 1.11815834
1.04107022 0.89132261 0.89728475 1.02238417]
mean value: 0.9846079111099243
key: score_time
value: [0.0144639 0.01220942 0.01224375 0.0165391 0.01481962 0.0146389
0.01722312 0.01506114 0.01477838 0.02237964]
mean value: 0.01543569564819336
key: test_mcc
value: [0.92704716 0.85164138 0.61538462 0.88527041 0.89056356 0.77849894
0.84615385 0.80829038 0.77849894 0.80829038]
mean value: 0.8189639617732613
key: train_mcc
value: [0.89778103 0.82535469 0.84262186 0.89790486 0.90641581 0.90233192
0.90252815 0.90651431 0.90220118 0.82571883]
mean value: 0.8809372623148111
key: test_accuracy
value: [0.96226415 0.9245283 0.80769231 0.94230769 0.94230769 0.88461538
0.92307692 0.90384615 0.88461538 0.90384615]
mean value: 0.907910014513788
key: train_accuracy
value: [0.94882729 0.91257996 0.9212766 0.94893617 0.95319149 0.95106383
0.95106383 0.95319149 0.95106383 0.91276596]
mean value: 0.9403960440956313
key: test_fscore
value: [0.96 0.92307692 0.80769231 0.94339623 0.94545455 0.875
0.92307692 0.90196078 0.89285714 0.90566038]
mean value: 0.9078175230245152
key: train_fscore
value: [0.94936709 0.91331924 0.9217759 0.94915254 0.95338983 0.95157895
0.95178197 0.9535865 0.95137421 0.91368421]
mean value: 0.9409010432532758
key: test_precision
value: [1. 0.96 0.80769231 0.92592593 0.89655172 0.95454545
0.92307692 0.92 0.83333333 0.88888889]
mean value: 0.9110014557600765
key: train_precision
value: [0.94142259 0.90376569 0.91596639 0.94514768 0.94936709 0.94166667
0.93801653 0.94560669 0.94537815 0.90416667]
mean value: 0.9330504147086066
key: test_recall
value: [0.92307692 0.88888889 0.80769231 0.96153846 1. 0.80769231
0.92307692 0.88461538 0.96153846 0.92307692]
mean value: 0.9081196581196581
key: train_recall
value: [0.95744681 0.92307692 0.92765957 0.95319149 0.95744681 0.96170213
0.96595745 0.96170213 0.95744681 0.92340426]
mean value: 0.9489034369885434
key: test_roc_auc
value: [0.96153846 0.92521368 0.80769231 0.94230769 0.94230769 0.88461538
0.92307692 0.90384615 0.88461538 0.90384615]
mean value: 0.9079059829059829
key: train_roc_auc
value: [0.94880887 0.91260229 0.9212766 0.94893617 0.95319149 0.95106383
0.95106383 0.95319149 0.95106383 0.91276596]
mean value: 0.9403964357155847
key: test_jcc
value: [0.92307692 0.85714286 0.67741935 0.89285714 0.89655172 0.77777778
0.85714286 0.82142857 0.80645161 0.82758621]
mean value: 0.8337435028202548
key: train_jcc
value: [0.90361446 0.84046693 0.85490196 0.90322581 0.91093117 0.90763052
0.908 0.91129032 0.90725806 0.84108527]
mean value: 0.8888404505729317
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01424456 0.01084495 0.01039648 0.00999761 0.01003218 0.01008916
0.01021504 0.01027536 0.01029849 0.00998974]
mean value: 0.01063835620880127
key: score_time
value: [0.01386261 0.00955462 0.00925779 0.00907087 0.00900865 0.00900984
0.00913215 0.00919223 0.00887728 0.00897932]
mean value: 0.009594535827636719
key: test_mcc
value: [0.82552431 0.46464327 0.54494926 0.69230769 0.73131034 0.58789635
0.57735027 0.77151675 0.54006172 0.61538462]
mean value: 0.6350944580966691
key: train_mcc
value: [0.66639366 0.66929675 0.7289762 0.68358593 0.69117257 0.68473679
0.69424587 0.68473679 0.70419643 0.65795145]
mean value: 0.6865292430336383
key: test_accuracy
value: [0.90566038 0.71698113 0.76923077 0.84615385 0.86538462 0.78846154
0.78846154 0.88461538 0.76923077 0.80769231]
mean value: 0.8141872278664731
key: train_accuracy
value: [0.8315565 0.8336887 0.86170213 0.84042553 0.84468085 0.84042553
0.84680851 0.84042553 0.85106383 0.82765957]
mean value: 0.8418436691920338
key: test_fscore
value: [0.89361702 0.66666667 0.78571429 0.84615385 0.86792453 0.76595745
0.78431373 0.88 0.76 0.80769231]
mean value: 0.8058039828104295
key: train_fscore
value: [0.82326622 0.82666667 0.85260771 0.83296214 0.8388521 0.83146067
0.85 0.83146067 0.84513274 0.81959911]
mean value: 0.8352008031680325
key: test_precision
value: [1. 0.83333333 0.73333333 0.84615385 0.85185185 0.85714286
0.8 0.91666667 0.79166667 0.80769231]
mean value: 0.8437840862840863
key: train_precision
value: [0.86792453 0.86111111 0.91262136 0.87383178 0.87155963 0.88095238
0.83265306 0.88095238 0.88018433 0.85981308]
mean value: 0.8721603646403393
key: test_recall
value: [0.80769231 0.55555556 0.84615385 0.84615385 0.88461538 0.69230769
0.76923077 0.84615385 0.73076923 0.80769231]
mean value: 0.7786324786324786
key: train_recall
value: [0.78297872 0.79487179 0.8 0.79574468 0.80851064 0.78723404
0.86808511 0.78723404 0.81276596 0.78297872]
mean value: 0.8020403709765412
key: test_roc_auc
value: [0.90384615 0.72008547 0.76923077 0.84615385 0.86538462 0.78846154
0.78846154 0.88461538 0.76923077 0.80769231]
mean value: 0.8143162393162393
key: train_roc_auc
value: [0.8316603 0.83360611 0.86170213 0.84042553 0.84468085 0.84042553
0.84680851 0.84042553 0.85106383 0.82765957]
mean value: 0.8418457901436625
key: test_jcc
value: [0.80769231 0.5 0.64705882 0.73333333 0.76666667 0.62068966
0.64516129 0.78571429 0.61290323 0.67741935]
mean value: 0.6796638943076161
key: train_jcc
value: [0.69961977 0.70454545 0.743083 0.71374046 0.72243346 0.71153846
0.73913043 0.71153846 0.73180077 0.69433962]
mean value: 0.717176989523702
MCC on Blind test: 0.63
Accuracy on Blind test: 0.81
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01102686 0.01034999 0.01119971 0.01043487 0.0116086 0.0105567
0.01021433 0.01092291 0.01056623 0.01027584]
mean value: 0.010715603828430176
key: score_time
value: [0.00961232 0.00919366 0.00915337 0.00907326 0.00980663 0.00913072
0.00897837 0.0090673 0.00913048 0.00903487]
mean value: 0.009218096733093262
key: test_mcc
value: [0.92704716 0.59688314 0.50336201 0.73131034 0.70064905 0.69436507
0.57735027 0.76923077 0.66628253 0.65433031]
mean value: 0.6820810650750807
key: train_mcc
value: [0.73987525 0.70625194 0.78298581 0.71521098 0.7745312 0.74043224
0.69894261 0.74470782 0.76195052 0.74048587]
mean value: 0.7405374249725031
key: test_accuracy
value: [0.96226415 0.79245283 0.75 0.86538462 0.84615385 0.84615385
0.78846154 0.88461538 0.82692308 0.82692308]
mean value: 0.8389332365747459
key: train_accuracy
value: [0.86993603 0.85287846 0.89148936 0.85744681 0.88723404 0.87021277
0.84893617 0.87234043 0.88085106 0.87021277]
mean value: 0.8701537903189221
key: test_fscore
value: [0.96 0.7755102 0.76363636 0.86792453 0.85714286 0.84
0.78431373 0.88461538 0.84210526 0.82352941]
mean value: 0.8398777738190921
key: train_fscore
value: [0.87048832 0.8496732 0.89171975 0.85529158 0.88794926 0.87048832
0.84463895 0.87179487 0.87931034 0.86937901]
mean value: 0.8690733611272227
key: test_precision
value: [1. 0.86363636 0.72413793 0.85185185 0.8 0.875
0.8 0.88461538 0.77419355 0.84 ]
mean value: 0.841343507952518
key: train_precision
value: [0.86864407 0.86666667 0.88983051 0.86842105 0.88235294 0.86864407
0.86936937 0.87553648 0.89082969 0.875 ]
mean value: 0.8755294848921722
key: test_recall
value: [0.92307692 0.7037037 0.80769231 0.88461538 0.92307692 0.80769231
0.76923077 0.88461538 0.92307692 0.80769231]
mean value: 0.8434472934472934
key: train_recall
value: [0.87234043 0.83333333 0.89361702 0.84255319 0.89361702 0.87234043
0.8212766 0.86808511 0.86808511 0.86382979]
mean value: 0.8629078014184397
key: test_roc_auc
value: [0.96153846 0.79415954 0.75 0.86538462 0.84615385 0.84615385
0.78846154 0.88461538 0.82692308 0.82692308]
mean value: 0.8390313390313391
key: train_roc_auc
value: [0.8699309 0.85283688 0.89148936 0.85744681 0.88723404 0.87021277
0.84893617 0.87234043 0.88085106 0.87021277]
mean value: 0.8701491180214584
key: test_jcc
value: [0.92307692 0.63333333 0.61764706 0.76666667 0.75 0.72413793
0.64516129 0.79310345 0.72727273 0.7 ]
mean value: 0.7280399378806105
key: train_jcc
value: [0.77067669 0.73863636 0.8045977 0.74716981 0.79847909 0.77067669
0.73106061 0.77272727 0.78461538 0.76893939]
mean value: 0.7687579004360319
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00984716 0.01065707 0.01068068 0.01061821 0.01073027 0.0107739
0.01069117 0.01094604 0.01067162 0.01062965]
mean value: 0.010624575614929199
key: score_time
value: [0.01390767 0.01305842 0.01274514 0.01643538 0.01310968 0.01333404
0.01317954 0.01302528 0.01261353 0.01284766]
mean value: 0.013425636291503906
key: test_mcc
value: [0.63760132 0.52028554 0.23145502 0.38575837 0.70064905 0.54006172
0.31139958 0.77151675 0.73568294 0.46291005]
mean value: 0.5297320348876765
key: train_mcc
value: [0.72710906 0.7106402 0.76601987 0.71495188 0.73197454 0.71066404
0.70669657 0.66043608 0.71980093 0.71490009]
mean value: 0.7163193248281741
key: test_accuracy
value: [0.81132075 0.75471698 0.61538462 0.69230769 0.84615385 0.76923077
0.65384615 0.88461538 0.86538462 0.73076923]
mean value: 0.7623730043541365
key: train_accuracy
value: [0.86353945 0.85501066 0.88297872 0.85744681 0.86595745 0.85531915
0.85319149 0.82978723 0.85957447 0.85744681]
mean value: 0.8580252234269382
key: test_fscore
value: [0.7826087 0.73469388 0.6 0.68 0.85714286 0.76
0.625 0.88888889 0.85714286 0.72 ]
mean value: 0.7505477176377797
key: train_fscore
value: [0.86324786 0.85152838 0.88222698 0.85653105 0.86509636 0.8559322
0.85097192 0.82532751 0.85652174 0.85714286]
mean value: 0.856452687007534
key: test_precision
value: [0.9 0.81818182 0.625 0.70833333 0.8 0.79166667
0.68181818 0.85714286 0.91304348 0.75 ]
mean value: 0.7845186335403727
key: train_precision
value: [0.86695279 0.87053571 0.88793103 0.86206897 0.87068966 0.85232068
0.86403509 0.84753363 0.87555556 0.85897436]
mean value: 0.8656597468799392
key: test_recall
value: [0.69230769 0.66666667 0.57692308 0.65384615 0.92307692 0.73076923
0.57692308 0.92307692 0.80769231 0.69230769]
mean value: 0.7243589743589743
key: train_recall
value: [0.85957447 0.83333333 0.87659574 0.85106383 0.85957447 0.85957447
0.83829787 0.80425532 0.83829787 0.85531915]
mean value: 0.8475886524822696
key: test_roc_auc
value: [0.80911681 0.75641026 0.61538462 0.69230769 0.84615385 0.76923077
0.65384615 0.88461538 0.86538462 0.73076923]
mean value: 0.7623219373219373
key: train_roc_auc
value: [0.86354792 0.85496454 0.88297872 0.85744681 0.86595745 0.85531915
0.85319149 0.82978723 0.85957447 0.85744681]
mean value: 0.8580214584469904
key: test_jcc
value: [0.64285714 0.58064516 0.42857143 0.51515152 0.75 0.61290323
0.45454545 0.8 0.75 0.5625 ]
mean value: 0.6097173928222316
key: train_jcc
value: [0.7593985 0.74144487 0.78927203 0.74906367 0.76226415 0.74814815
0.7406015 0.70260223 0.74904943 0.75 ]
mean value: 0.7491844527216088
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02203298 0.02313685 0.02299261 0.02083182 0.02209663 0.01996708
0.02413058 0.02186084 0.02108955 0.0205934 ]
mean value: 0.02187323570251465
key: score_time
value: [0.01181936 0.01257968 0.01236892 0.01224875 0.01212788 0.01159978
0.01305223 0.0125947 0.01200342 0.01176119]
mean value: 0.012215590476989746
key: test_mcc
value: [0.92704716 0.81688878 0.61538462 0.88527041 0.74466871 0.77849894
0.80829038 0.84615385 0.74466871 0.80829038]
mean value: 0.7975161942581178
key: train_mcc
value: [0.78688615 0.79976356 0.82130634 0.79149653 0.80857653 0.80035515
0.8000652 0.79155386 0.80857653 0.80000724]
mean value: 0.8008587091662202
key: test_accuracy
value: [0.96226415 0.90566038 0.80769231 0.94230769 0.86538462 0.88461538
0.90384615 0.92307692 0.86538462 0.90384615]
mean value: 0.8964078374455733
key: train_accuracy
value: [0.89339019 0.89978678 0.9106383 0.89574468 0.90425532 0.9
0.9 0.89574468 0.90425532 0.9 ]
mean value: 0.900381527015379
key: test_fscore
value: [0.96 0.90196078 0.80769231 0.94339623 0.87719298 0.875
0.90196078 0.92307692 0.87719298 0.90566038]
mean value: 0.8973133368082548
key: train_fscore
value: [0.89451477 0.90063425 0.91101695 0.89596603 0.90364026 0.90146751
0.90063425 0.89640592 0.90364026 0.90021231]
mean value: 0.9008132498798447
key: test_precision
value: [1. 0.95833333 0.80769231 0.92592593 0.80645161 0.95454545
0.92 0.92307692 0.80645161 0.88888889]
mean value: 0.8991366059269286
key: train_precision
value: [0.88702929 0.89121339 0.907173 0.8940678 0.90948276 0.88842975
0.89495798 0.8907563 0.90948276 0.89830508]
mean value: 0.8970898109982571
key: test_recall
value: [0.92307692 0.85185185 0.80769231 0.96153846 0.96153846 0.80769231
0.88461538 0.92307692 0.96153846 0.92307692]
mean value: 0.9005698005698006
key: train_recall
value: [0.90212766 0.91025641 0.91489362 0.89787234 0.89787234 0.91489362
0.90638298 0.90212766 0.89787234 0.90212766]
mean value: 0.9046426623022368
key: test_roc_auc
value: [0.96153846 0.90669516 0.80769231 0.94230769 0.86538462 0.88461538
0.90384615 0.92307692 0.86538462 0.90384615]
mean value: 0.8964387464387464
key: train_roc_auc
value: [0.89337152 0.89980906 0.9106383 0.89574468 0.90425532 0.9
0.9 0.89574468 0.90425532 0.9 ]
mean value: 0.9003818876159301
key: test_jcc
value: [0.92307692 0.82142857 0.67741935 0.89285714 0.78125 0.77777778
0.82142857 0.85714286 0.78125 0.82758621]
mean value: 0.8161217405447105
key: train_jcc
value: [0.80916031 0.81923077 0.83657588 0.81153846 0.82421875 0.82061069
0.81923077 0.81226054 0.82421875 0.81853282]
mean value: 0.819557772278408
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.02500224 2.07535386 2.03847885 2.11725354 2.08513308 2.02912474
1.15581083 2.13639879 2.00898528 2.09049392]
mean value: 1.9762035131454467
key: score_time
value: [0.01247501 0.01451373 0.01420856 0.01249385 0.01248074 0.02131391
0.01252007 0.02289152 0.01492286 0.01478481]
mean value: 0.01526050567626953
key: test_mcc
value: [0.92704716 0.77350427 0.54006172 0.82305489 0.9258201 0.77849894
0.80829038 0.88527041 0.77151675 0.73131034]
mean value: 0.7964374979189948
key: train_mcc
value: [1. 0.99150739 1. 1. 0.99148936 0.9957537
0.95320012 1. 0.9957537 1. ]
mean value: 0.9927704266438734
key: test_accuracy
value: [0.96226415 0.88679245 0.76923077 0.90384615 0.96153846 0.88461538
0.90384615 0.94230769 0.88461538 0.86538462]
mean value: 0.89644412191582
key: train_accuracy
value: [1. 0.99573561 1. 1. 0.99574468 0.99787234
0.97659574 1. 0.99787234 1. ]
mean value: 0.9963820714058885
key: test_fscore
value: [0.96 0.88888889 0.77777778 0.9122807 0.96296296 0.875
0.90196078 0.94117647 0.88888889 0.86792453]
mean value: 0.8976861003476753
key: train_fscore
value: [1. 0.99574468 1. 1. 0.99574468 0.9978678
0.97664544 1. 0.99787686 1. ]
mean value: 0.9963879458533711
key: test_precision
value: [1. 0.88888889 0.75 0.83870968 0.92857143 0.95454545
0.92 0.96 0.85714286 0.85185185]
mean value: 0.8949710158419836
key: train_precision
value: [1. 0.99152542 1. 1. 0.99574468 1.
0.97457627 1. 0.99576271 1. ]
mean value: 0.9957609087630724
key: test_recall
value: [0.92307692 0.88888889 0.80769231 1. 1. 0.80769231
0.88461538 0.92307692 0.92307692 0.88461538]
mean value: 0.9042735042735043
key: train_recall
value: [1. 1. 1. 1. 0.99574468 0.99574468
0.9787234 1. 1. 1. ]
mean value: 0.9970212765957447
key: test_roc_auc
value: [0.96153846 0.88675214 0.76923077 0.90384615 0.96153846 0.88461538
0.90384615 0.94230769 0.88461538 0.86538462]
mean value: 0.8963675213675214
key: train_roc_auc
value: [1. 0.99574468 1. 1. 0.99574468 0.99787234
0.97659574 1. 0.99787234 1. ]
mean value: 0.9963829787234042
key: test_jcc
value: [0.92307692 0.8 0.63636364 0.83870968 0.92857143 0.77777778
0.82142857 0.88888889 0.8 0.76666667]
mean value: 0.8181483570193248
key: train_jcc
value: [1. 0.99152542 1. 1. 0.99152542 0.99574468
0.95435685 1. 0.99576271 1. ]
mean value: 0.9928915086646127
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02680755 0.02203512 0.02136636 0.02250838 0.02036333 0.02217436
0.02208591 0.02092457 0.02369666 0.02297091]
mean value: 0.022493314743041993
key: score_time
value: [0.01222968 0.00935984 0.00914311 0.00899458 0.0090909 0.00908637
0.00928926 0.00905418 0.0090971 0.00970721]
mean value: 0.009505224227905274
key: test_mcc
value: [0.85164138 0.92450142 0.77849894 0.88527041 0.88527041 0.88527041
0.84615385 0.88527041 0.96225045 0.84866842]
mean value: 0.8752796119384096
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.96226415 0.88461538 0.94230769 0.94230769 0.94230769
0.92307692 0.94230769 0.98076923 0.92307692]
mean value: 0.9367561683599419
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 0.96296296 0.89285714 0.94339623 0.94339623 0.94117647
0.92307692 0.94339623 0.98113208 0.92592593]
mean value: 0.9383246106054097
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 0.96296296 0.83333333 0.92592593 0.92592593 0.96
0.92307692 0.92592593 0.96296296 0.89285714]
mean value: 0.9205828245828246
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 0.96296296 0.96153846 0.96153846 0.96153846 0.92307692
0.92307692 0.96153846 1. 0.96153846]
mean value: 0.9578347578347579
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 0.96225071 0.88461538 0.94230769 0.94230769 0.94230769
0.92307692 0.94230769 0.98076923 0.92307692]
mean value: 0.9368233618233619
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 0.92857143 0.80645161 0.89285714 0.89285714 0.88888889
0.85714286 0.89285714 0.96296296 0.86206897]
mean value: 0.8846727110075274
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12310386 0.12195277 0.1237545 0.12339377 0.12358236 0.12080479
0.12089777 0.12545443 0.12270761 0.12189317]
mean value: 0.12275450229644776
key: score_time
value: [0.01917434 0.01886702 0.01860046 0.01904464 0.01814556 0.01823425
0.0187676 0.018188 0.01808381 0.01885533]
mean value: 0.01859610080718994
key: test_mcc
value: [0.85122386 0.70692282 0.50336201 0.88527041 0.85634884 0.81312325
0.84615385 0.88527041 0.82305489 0.89056356]
mean value: 0.8061293898911333
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 0.8490566 0.75 0.94230769 0.92307692 0.90384615
0.92307692 0.94230769 0.90384615 0.94230769]
mean value: 0.9004354136429609
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92 0.84 0.76363636 0.94339623 0.92857143 0.89795918
0.92307692 0.94339623 0.9122807 0.94545455]
mean value: 0.9017771598997305
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95833333 0.91304348 0.72413793 0.92592593 0.86666667 0.95652174
0.92307692 0.92592593 0.83870968 0.89655172]
mean value: 0.8928893324911849
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88461538 0.77777778 0.80769231 0.96153846 1. 0.84615385
0.92307692 0.96153846 1. 1. ]
mean value: 0.9162393162393162
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92378917 0.85042735 0.75 0.94230769 0.92307692 0.90384615
0.92307692 0.94230769 0.90384615 0.94230769]
mean value: 0.9004985754985755
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85185185 0.72413793 0.61764706 0.89285714 0.86666667 0.81481481
0.85714286 0.89285714 0.83870968 0.89655172]
mean value: 0.8253236867605774
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0104301 0.01049423 0.01046538 0.01045418 0.01051521 0.01134562
0.01081777 0.01171875 0.01168561 0.01107073]
mean value: 0.010899758338928223
key: score_time
value: [0.00904679 0.00917101 0.00891232 0.0090096 0.00909114 0.00902176
0.00980735 0.00962543 0.00923777 0.00924516]
mean value: 0.009216833114624023
key: test_mcc
value: [0.69957726 0.44368795 0.38575837 0.55339859 0.70064905 0.54494926
0.43112399 0.57735027 0.34641016 0.34848139]
mean value: 0.5031386296029412
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8490566 0.71698113 0.69230769 0.76923077 0.84615385 0.76923077
0.71153846 0.78846154 0.67307692 0.67307692]
mean value: 0.7489114658925979
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84 0.69387755 0.7037037 0.73913043 0.85714286 0.75
0.68085106 0.79245283 0.66666667 0.69090909]
mean value: 0.7414734198243802
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.875 0.77272727 0.67857143 0.85 0.8 0.81818182
0.76190476 0.77777778 0.68 0.65517241]
mean value: 0.7669335472956162
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.80769231 0.62962963 0.73076923 0.65384615 0.92307692 0.69230769
0.61538462 0.80769231 0.65384615 0.73076923]
mean value: 0.7245014245014245
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8482906 0.71866097 0.69230769 0.76923077 0.84615385 0.76923077
0.71153846 0.78846154 0.67307692 0.67307692]
mean value: 0.749002849002849
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72413793 0.53125 0.54285714 0.5862069 0.75 0.6
0.51612903 0.65625 0.5 0.52777778]
mean value: 0.5934608780479191
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.85724831 1.80352688 1.76003718 1.77659369 1.78128886 1.81115103
1.7989316 1.79322004 1.84129739 1.78404427]
mean value: 1.8007339239120483
key: score_time
value: [0.09947538 0.09356856 0.09668612 0.09286475 0.10176277 0.10127997
0.10069108 0.09549236 0.09286833 0.10066128]
mean value: 0.09753506183624268
key: test_mcc
value: [0.92450142 0.88746439 0.84615385 0.88527041 0.96225045 0.9258201
0.88527041 0.92307692 0.9258201 1. ]
mean value: 0.9165628054905913
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96226415 0.94339623 0.92307692 0.94230769 0.98076923 0.96153846
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.9578737300435414
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96153846 0.94339623 0.92307692 0.94339623 0.98113208 0.96
0.94117647 0.96153846 0.96296296 1. ]
mean value: 0.9578217808006931
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96153846 0.96153846 0.92307692 0.92592593 0.96296296 1.
0.96 0.96153846 0.92857143 1. ]
mean value: 0.9585152625152625
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 0.92592593 0.92307692 0.96153846 1. 0.92307692
0.92307692 0.96153846 1. 1. ]
mean value: 0.957977207977208
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96225071 0.94373219 0.92307692 0.94230769 0.98076923 0.96153846
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.957905982905983
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92592593 0.89285714 0.85714286 0.89285714 0.96296296 0.92307692
0.88888889 0.92592593 0.92857143 1. ]
mean value: 0.9198209198209198
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.93789101 0.9519825 0.9610076 0.95415044 1.06602931 0.96349025
0.95963335 1.00183535 0.93617368 0.97349167]
mean value: 0.9705685138702392
key: score_time
value: [0.27528143 0.22235203 0.27172637 0.26692533 0.2285161 0.21771598
0.26816726 0.23777795 0.24989247 0.26056457]
mean value: 0.24989194869995118
key: test_mcc
value: [0.92450142 0.78307508 0.80829038 0.88527041 0.9258201 0.9258201
0.88527041 0.92307692 0.9258201 1. ]
mean value: 0.8986944924445692
key: train_mcc
value: [0.95309971 0.94457073 0.95744681 0.95326917 0.94893617 0.95320012
0.96171083 0.95748148 0.94893617 0.94897054]
mean value: 0.9527621739445413
key: test_accuracy
value: [0.96226415 0.88679245 0.90384615 0.94230769 0.96153846 0.96153846
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.948367198838897
key: train_accuracy
value: [0.97654584 0.97228145 0.9787234 0.97659574 0.97446809 0.97659574
0.98085106 0.9787234 0.97446809 0.97446809]
mean value: 0.9763720909132151
key: test_fscore
value: [0.96153846 0.88 0.90566038 0.94339623 0.96296296 0.96
0.94117647 0.96153846 0.96296296 1. ]
mean value: 0.947923592336467
key: train_fscore
value: [0.97664544 0.97216274 0.9787234 0.9764454 0.97446809 0.97654584
0.98081023 0.97863248 0.97446809 0.97435897]
mean value: 0.9763260676507729
key: test_precision
value: [0.96153846 0.95652174 0.88888889 0.92592593 0.92857143 1.
0.96 0.96153846 0.92857143 1. ]
mean value: 0.951155633416503
key: train_precision
value: [0.97457627 0.97424893 0.9787234 0.98275862 0.97446809 0.97863248
0.98290598 0.98283262 0.97446809 0.97854077]
mean value: 0.9782155245479209
key: test_recall
value: [0.96153846 0.81481481 0.92307692 0.96153846 1. 0.92307692
0.92307692 0.96153846 1. 1. ]
mean value: 0.9468660968660969
key: train_recall
value: [0.9787234 0.97008547 0.9787234 0.97021277 0.97446809 0.97446809
0.9787234 0.97446809 0.97446809 0.97021277]
mean value: 0.9744553555191853
key: test_roc_auc
value: [0.96225071 0.88817664 0.90384615 0.94230769 0.96153846 0.96153846
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.9485042735042736
key: train_roc_auc
value: [0.97654119 0.97227678 0.9787234 0.97659574 0.97446809 0.97659574
0.98085106 0.9787234 0.97446809 0.97446809]
mean value: 0.9763711583924349
key: test_jcc
value: [0.92592593 0.78571429 0.82758621 0.89285714 0.92857143 0.92307692
0.88888889 0.92592593 0.92857143 1. ]
mean value: 0.9027118156428502
key: train_jcc
value: [0.95435685 0.94583333 0.95833333 0.9539749 0.95020747 0.95416667
0.9623431 0.958159 0.95020747 0.95 ]
mean value: 0.9537582105013397
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02532554 0.01064587 0.01129389 0.0115211 0.01132441 0.0113101
0.01137424 0.0111444 0.01132274 0.01128101]
mean value: 0.012654328346252441
key: score_time
value: [0.01154637 0.00929785 0.00982523 0.00981331 0.00931716 0.00963163
0.00972509 0.00960636 0.00930762 0.00981116]
mean value: 0.009788179397583007
key: test_mcc
value: [0.92704716 0.59688314 0.50336201 0.73131034 0.70064905 0.69436507
0.57735027 0.76923077 0.66628253 0.65433031]
mean value: 0.6820810650750807
key: train_mcc
value: [0.73987525 0.70625194 0.78298581 0.71521098 0.7745312 0.74043224
0.69894261 0.74470782 0.76195052 0.74048587]
mean value: 0.7405374249725031
key: test_accuracy
value: [0.96226415 0.79245283 0.75 0.86538462 0.84615385 0.84615385
0.78846154 0.88461538 0.82692308 0.82692308]
mean value: 0.8389332365747459
key: train_accuracy
value: [0.86993603 0.85287846 0.89148936 0.85744681 0.88723404 0.87021277
0.84893617 0.87234043 0.88085106 0.87021277]
mean value: 0.8701537903189221
key: test_fscore
value: [0.96 0.7755102 0.76363636 0.86792453 0.85714286 0.84
0.78431373 0.88461538 0.84210526 0.82352941]
mean value: 0.8398777738190921
key: train_fscore
value: [0.87048832 0.8496732 0.89171975 0.85529158 0.88794926 0.87048832
0.84463895 0.87179487 0.87931034 0.86937901]
mean value: 0.8690733611272227
key: test_precision
value: [1. 0.86363636 0.72413793 0.85185185 0.8 0.875
0.8 0.88461538 0.77419355 0.84 ]
mean value: 0.841343507952518
key: train_precision
value: [0.86864407 0.86666667 0.88983051 0.86842105 0.88235294 0.86864407
0.86936937 0.87553648 0.89082969 0.875 ]
mean value: 0.8755294848921722
key: test_recall
value: [0.92307692 0.7037037 0.80769231 0.88461538 0.92307692 0.80769231
0.76923077 0.88461538 0.92307692 0.80769231]
mean value: 0.8434472934472934
key: train_recall
value: [0.87234043 0.83333333 0.89361702 0.84255319 0.89361702 0.87234043
0.8212766 0.86808511 0.86808511 0.86382979]
mean value: 0.8629078014184397
key: test_roc_auc
value: [0.96153846 0.79415954 0.75 0.86538462 0.84615385 0.84615385
0.78846154 0.88461538 0.82692308 0.82692308]
mean value: 0.8390313390313391
key: train_roc_auc
value: [0.8699309 0.85283688 0.89148936 0.85744681 0.88723404 0.87021277
0.84893617 0.87234043 0.88085106 0.87021277]
mean value: 0.8701491180214584
key: test_jcc
value: [0.92307692 0.63333333 0.61764706 0.76666667 0.75 0.72413793
0.64516129 0.79310345 0.72727273 0.7 ]
mean value: 0.7280399378806105
key: train_jcc
value: [0.77067669 0.73863636 0.8045977 0.74716981 0.79847909 0.77067669
0.73106061 0.77272727 0.78461538 0.76893939]
mean value: 0.7687579004360319
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10077667 0.09779096 0.08207917 0.07680631 0.07786369 0.07240462
0.07249331 0.07102299 0.07772779 0.081285 ]
mean value: 0.08102505207061768
key: score_time
value: [0.0129652 0.01151061 0.01169777 0.01227379 0.01141953 0.01082182
0.01121521 0.01066947 0.01165438 0.01114535]
mean value: 0.011537313461303711
key: test_mcc
value: [0.85164138 1. 0.84866842 0.9258201 0.96225045 0.9258201
0.88527041 0.88527041 0.9258201 1. ]
mean value: 0.9210561378370106
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 1. 0.92307692 0.96153846 0.98076923 0.96153846
0.94230769 0.94230769 0.96153846 1. ]
mean value: 0.9597605224963716
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 1. 0.92592593 0.96296296 0.98113208 0.96
0.94117647 0.94117647 0.96296296 1. ]
mean value: 0.9601262794425947
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 1. 0.89285714 0.92857143 0.96296296 1.
0.96 0.96 0.92857143 1. ]
mean value: 0.9525820105820106
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.92307692
0.92307692 0.92307692 1. 1. ]
mean value: 0.9692307692307692
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 1. 0.92307692 0.96153846 0.98076923 0.96153846
0.94230769 0.94230769 0.96153846 1. ]
mean value: 0.9598290598290599
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 1. 0.86206897 0.92857143 0.96296296 0.92307692
0.88888889 0.88888889 0.92857143 1. ]
mean value: 0.9245098451995004
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05597258 0.06626916 0.07366967 0.09542322 0.04672813 0.05279183
0.08359885 0.06806421 0.04606438 0.07742214]
mean value: 0.06660041809082032
key: score_time
value: [0.02560306 0.01884866 0.01223159 0.02448392 0.0155127 0.01884794
0.02686143 0.01222873 0.01251125 0.01234651]
mean value: 0.017947578430175783
key: test_mcc
value: [0.89227454 0.85164138 0.61538462 0.81312325 0.74466871 0.73568294
0.65433031 0.80829038 0.73568294 0.81312325]
mean value: 0.7664202294856968
key: train_mcc
value: [0.90647462 0.90621761 0.91955698 0.90233192 0.90233192 0.90667855
0.91502618 0.91071251 0.91492675 0.90220118]
mean value: 0.9086458224458123
key: test_accuracy
value: [0.94339623 0.9245283 0.80769231 0.90384615 0.86538462 0.86538462
0.82692308 0.90384615 0.86538462 0.90384615]
mean value: 0.881023222060958
key: train_accuracy
value: [0.95309168 0.95309168 0.95957447 0.95106383 0.95106383 0.95319149
0.95744681 0.95531915 0.95744681 0.95106383]
mean value: 0.9542353581635894
key: test_fscore
value: [0.93877551 0.92307692 0.80769231 0.90909091 0.87719298 0.85714286
0.83018868 0.90196078 0.87272727 0.90909091]
mean value: 0.8826939135040409
key: train_fscore
value: [0.95378151 0.95319149 0.96016771 0.95157895 0.95157895 0.95378151
0.95780591 0.95560254 0.95762712 0.95137421]
mean value: 0.9546489894196434
key: test_precision
value: [1. 0.96 0.80769231 0.86206897 0.80645161 0.91304348
0.81481481 0.92 0.82758621 0.86206897]
mean value: 0.8773726351602252
key: train_precision
value: [0.94190871 0.94915254 0.94628099 0.94166667 0.94166667 0.94190871
0.94979079 0.94957983 0.9535865 0.94537815]
mean value: 0.9460919570890296
key: test_recall
value: [0.88461538 0.88888889 0.80769231 0.96153846 0.96153846 0.80769231
0.84615385 0.88461538 0.92307692 0.96153846]
mean value: 0.8927350427350428
key: train_recall
value: [0.96595745 0.95726496 0.97446809 0.96170213 0.96170213 0.96595745
0.96595745 0.96170213 0.96170213 0.95744681]
mean value: 0.9633860701945808
key: test_roc_auc
value: [0.94230769 0.92521368 0.80769231 0.90384615 0.86538462 0.86538462
0.82692308 0.90384615 0.86538462 0.90384615]
mean value: 0.8809829059829061
key: train_roc_auc
value: [0.95306419 0.95310056 0.95957447 0.95106383 0.95106383 0.95319149
0.95744681 0.95531915 0.95744681 0.95106383]
mean value: 0.9542334969994545
key: test_jcc
value: [0.88461538 0.85714286 0.67741935 0.83333333 0.78125 0.75
0.70967742 0.82142857 0.77419355 0.83333333]
mean value: 0.7922393802434124
key: train_jcc
value: [0.91164659 0.91056911 0.9233871 0.90763052 0.90763052 0.91164659
0.91902834 0.91497976 0.91869919 0.90725806]
mean value: 0.9132475768006711
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.0344193 0.01032948 0.01007748 0.0096755 0.00967216 0.00974727
0.0099268 0.01015282 0.01100254 0.01081896]
mean value: 0.012582230567932128
key: score_time
value: [0.01461482 0.00918388 0.00888848 0.00873566 0.00871611 0.00898671
0.00880671 0.00905466 0.00932407 0.00937629]
mean value: 0.009568738937377929
key: test_mcc
value: [0.89227454 0.57616505 0.54494926 0.80829038 0.70064905 0.71151247
0.65824263 0.73568294 0.70064905 0.65824263]
mean value: 0.6986657991505085
key: train_mcc
value: [0.71462102 0.66795337 0.76214388 0.66895783 0.74910575 0.70654292
0.70690158 0.71128258 0.73659716 0.68550371]
mean value: 0.7109609798884714
key: test_accuracy
value: [0.94339623 0.77358491 0.76923077 0.90384615 0.84615385 0.84615385
0.82692308 0.86538462 0.84615385 0.82692308]
mean value: 0.8447750362844703
key: train_accuracy
value: [0.85714286 0.8336887 0.88085106 0.83404255 0.87446809 0.85319149
0.85319149 0.85531915 0.86808511 0.84255319]
mean value: 0.8552533684162773
key: test_fscore
value: [0.93877551 0.73913043 0.78571429 0.90566038 0.85714286 0.82608696
0.81632653 0.85714286 0.85714286 0.81632653]
mean value: 0.8399449197234267
key: train_fscore
value: [0.85529158 0.82969432 0.87878788 0.82969432 0.87311828 0.8516129
0.85032538 0.85217391 0.86580087 0.83982684]
mean value: 0.8526326282826382
key: test_precision
value: [1. 0.89473684 0.73333333 0.88888889 0.8 0.95
0.86956522 0.91304348 0.8 0.86956522]
mean value: 0.8719132977370964
key: train_precision
value: [0.86842105 0.84821429 0.89427313 0.85201794 0.8826087 0.86086957
0.86725664 0.87111111 0.88105727 0.85462555]
mean value: 0.8680455231850978
key: test_recall
value: [0.88461538 0.62962963 0.84615385 0.92307692 0.92307692 0.73076923
0.76923077 0.80769231 0.92307692 0.76923077]
mean value: 0.8206552706552707
key: train_recall
value: [0.84255319 0.81196581 0.86382979 0.80851064 0.86382979 0.84255319
0.83404255 0.83404255 0.85106383 0.82553191]
mean value: 0.8377923258774322
key: test_roc_auc
value: [0.94230769 0.77635328 0.76923077 0.90384615 0.84615385 0.84615385
0.82692308 0.86538462 0.84615385 0.82692308]
mean value: 0.8449430199430199
key: train_roc_auc
value: [0.85717403 0.83364248 0.88085106 0.83404255 0.87446809 0.85319149
0.85319149 0.85531915 0.86808511 0.84255319]
mean value: 0.8552518639752682
key: test_jcc
value: [0.88461538 0.5862069 0.64705882 0.82758621 0.75 0.7037037
0.68965517 0.75 0.75 0.68965517]
mean value: 0.7278481360124363
key: train_jcc
value: [0.74716981 0.70895522 0.78378378 0.70895522 0.77480916 0.74157303
0.73962264 0.74242424 0.76335878 0.7238806 ]
mean value: 0.7434532496453498
MCC on Blind test: 0.76
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01762056 0.01975036 0.01909256 0.01915002 0.0176785 0.02011013
0.02195501 0.01924419 0.02043986 0.01871943]
mean value: 0.019376063346862794
key: score_time
value: [0.01073146 0.01117897 0.01201153 0.0118525 0.01197219 0.01188111
0.01207972 0.01185107 0.01192999 0.01186323]
mean value: 0.011735177040100098
key: test_mcc
value: [0.81688878 0.18759297 0.65433031 0.6789146 0.88527041 0.76923077
0.80829038 0.74466871 0.72760688 0.80829038]
mean value: 0.7081084179489021
key: train_mcc
value: [0.86416967 0.43722856 0.90213583 0.72315664 0.83806613 0.83960257
0.90233192 0.76845352 0.80635665 0.85958225]
mean value: 0.7941083731200913
key: test_accuracy
value: [0.90566038 0.54716981 0.82692308 0.82692308 0.94230769 0.88461538
0.90384615 0.86538462 0.84615385 0.90384615]
mean value: 0.8452830188679246
key: train_accuracy
value: [0.92963753 0.66098081 0.95106383 0.84468085 0.91702128 0.91702128
0.95106383 0.87234043 0.89787234 0.92978723]
mean value: 0.887146940071678
key: test_fscore
value: [0.90909091 0.25 0.82352941 0.8 0.94339623 0.88461538
0.90196078 0.85106383 0.86666667 0.90566038]
mean value: 0.813598359001221
key: train_fscore
value: [0.93333333 0.48543689 0.95116773 0.81704261 0.91275168 0.92152918
0.95157895 0.85436893 0.90551181 0.92993631]
mean value: 0.8662657410357313
key: test_precision
value: [0.86206897 0.8 0.84 0.94736842 0.92592593 0.88461538
0.92 0.95238095 0.76470588 0.88888889]
mean value: 0.8785954420733966
key: train_precision
value: [0.88846154 1. 0.94915254 0.99390244 0.96226415 0.8740458
0.94166667 0.99435028 0.84249084 0.9279661 ]
mean value: 0.9374300365667224
key: test_recall
value: [0.96153846 0.14814815 0.80769231 0.69230769 0.96153846 0.88461538
0.88461538 0.76923077 1. 0.92307692]
mean value: 0.8032763532763533
key: train_recall
value: [0.98297872 0.32051282 0.95319149 0.69361702 0.86808511 0.97446809
0.96170213 0.74893617 0.9787234 0.93191489]
mean value: 0.8414129841789416
key: test_roc_auc
value: [0.90669516 0.5548433 0.82692308 0.82692308 0.94230769 0.88461538
0.90384615 0.86538462 0.84615385 0.90384615]
mean value: 0.8461538461538461
key: train_roc_auc
value: [0.92952355 0.66025641 0.95106383 0.84468085 0.91702128 0.91702128
0.95106383 0.87234043 0.89787234 0.92978723]
mean value: 0.8870631023822513
key: test_jcc
value: [0.83333333 0.14285714 0.7 0.66666667 0.89285714 0.79310345
0.82142857 0.74074074 0.76470588 0.82758621]
mean value: 0.7183279135408953
key: train_jcc
value: [0.875 0.32051282 0.90688259 0.69067797 0.83950617 0.85447761
0.90763052 0.74576271 0.82733813 0.86904762]
mean value: 0.7836836144984219
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01870584 0.01970434 0.01880717 0.01866412 0.02053785 0.02331161
0.02050233 0.0209446 0.02116704 0.02013636]
mean value: 0.02024812698364258
key: score_time
value: [0.01103997 0.01269507 0.0119617 0.01208353 0.01191449 0.01273751
0.01195216 0.0127418 0.01198077 0.01185441]
mean value: 0.012096142768859864
key: test_mcc
value: [0.75007832 0.59347897 0.65824263 0.66666667 0.9258201 0.80829038
0.80829038 0.84866842 0.74466871 0.75878691]
mean value: 0.7562991488419109
key: train_mcc
value: [0.82318874 0.79500161 0.85288412 0.78776807 0.89946992 0.88344643
0.89198214 0.86302723 0.90351119 0.77446957]
mean value: 0.8474749027324484
key: test_accuracy
value: [0.86792453 0.77358491 0.82692308 0.80769231 0.96153846 0.90384615
0.90384615 0.92307692 0.86538462 0.86538462]
mean value: 0.8699201741654572
key: train_accuracy
value: [0.90618337 0.8891258 0.92340426 0.88723404 0.94893617 0.94042553
0.94468085 0.92978723 0.95106383 0.8787234 ]
mean value: 0.9199564487592433
key: test_fscore
value: [0.87719298 0.72727273 0.81632653 0.83870968 0.96296296 0.90196078
0.90196078 0.92 0.87719298 0.88135593]
mean value: 0.8704935364010411
key: train_fscore
value: [0.91338583 0.87619048 0.91855204 0.89668616 0.94736842 0.94262295
0.94672131 0.92650334 0.95238095 0.89017341]
mean value: 0.9210584885895807
key: test_precision
value: [0.80645161 0.94117647 0.86956522 0.72222222 0.92857143 0.92
0.92 0.95833333 0.80645161 0.78787879]
mean value: 0.8660650685791763
key: train_precision
value: [0.84981685 0.98924731 0.98067633 0.82733813 0.97737557 0.90909091
0.91304348 0.97196262 0.92741935 0.81338028]
mean value: 0.9159350825957544
key: test_recall
value: [0.96153846 0.59259259 0.76923077 1. 1. 0.88461538
0.88461538 0.88461538 0.96153846 1. ]
mean value: 0.8938746438746439
key: train_recall
value: [0.98723404 0.78632479 0.86382979 0.9787234 0.91914894 0.9787234
0.98297872 0.88510638 0.9787234 0.98297872]
mean value: 0.9343771594835424
key: test_roc_auc
value: [0.86965812 0.77706553 0.82692308 0.80769231 0.96153846 0.90384615
0.90384615 0.92307692 0.86538462 0.86538462]
mean value: 0.8704415954415955
key: train_roc_auc
value: [0.90601018 0.88890707 0.92340426 0.88723404 0.94893617 0.94042553
0.94468085 0.92978723 0.95106383 0.8787234 ]
mean value: 0.9199172576832151
key: test_jcc
value: [0.78125 0.57142857 0.68965517 0.72222222 0.92857143 0.82142857
0.82142857 0.85185185 0.78125 0.78787879]
mean value: 0.7756965177223798
key: train_jcc
value: [0.84057971 0.77966102 0.84937238 0.81272085 0.9 0.89147287
0.89883268 0.86307054 0.90909091 0.80208333]
mean value: 0.8546884294973143
MCC on Blind test: 0.82
Accuracy on Blind test: 0.9
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18638968 0.18391204 0.18381119 0.18072701 0.18269753 0.18260503
0.18581295 0.18405628 0.18478727 0.18364573]
mean value: 0.1838444709777832
key: score_time
value: [0.0155859 0.01653624 0.01634264 0.01569414 0.01681447 0.01641345
0.01549268 0.01697898 0.01570892 0.01532364]
mean value: 0.016089105606079103
key: test_mcc
value: [0.88730475 0.96296296 0.84866842 0.9258201 0.96225045 0.96225045
0.81312325 0.92307692 0.96225045 1. ]
mean value: 0.9247707758320183
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94339623 0.98113208 0.92307692 0.96153846 0.98076923 0.98076923
0.90384615 0.96153846 0.98076923 1. ]
mean value: 0.9616835994194485
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.98113208 0.92592593 0.96296296 0.98113208 0.98039216
0.89795918 0.96153846 0.98113208 1. ]
mean value: 0.9613351387966895
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96 1. 0.89285714 0.92857143 0.96296296 1.
0.95652174 0.96153846 0.96296296 1. ]
mean value: 0.9625414698023393
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92307692 0.96296296 0.96153846 1. 1. 0.96153846
0.84615385 0.96153846 1. 1. ]
mean value: 0.9616809116809117
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94301994 0.98148148 0.92307692 0.96153846 0.98076923 0.98076923
0.90384615 0.96153846 0.98076923 1. ]
mean value: 0.9616809116809117
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.96296296 0.86206897 0.92857143 0.96296296 0.96153846
0.81481481 0.92592593 0.96296296 1. ]
mean value: 0.927069737414565
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0607388 0.07653952 0.08514023 0.07402754 0.08832049 0.07226753
0.07678175 0.06374931 0.08635044 0.07893109]
mean value: 0.07628467082977294
key: score_time
value: [0.01762295 0.02642894 0.02865911 0.04618526 0.03816152 0.0224731
0.02270222 0.03807664 0.03926802 0.03778052]
mean value: 0.03173582553863526
key: test_mcc
value: [0.92450142 0.96296296 0.84866842 0.9258201 0.96225045 0.96225045
0.88527041 0.88527041 0.84615385 1. ]
mean value: 0.9203148480995895
key: train_mcc
value: [0.99150739 0.98721563 0.98301432 0.99152527 0.9957537 0.98724298
0.9957537 1. 0.9873145 0.99152527]
mean value: 0.991085275446824
key: test_accuracy
value: [0.96226415 0.98113208 0.92307692 0.96153846 0.98076923 0.98076923
0.94230769 0.94230769 0.92307692 1. ]
mean value: 0.9597242380261248
key: train_accuracy
value: [0.99573561 0.99360341 0.99148936 0.99574468 0.99787234 0.99361702
0.99787234 1. 0.99361702 0.99574468]
mean value: 0.9955296465998276
key: test_fscore
value: [0.96153846 0.98113208 0.92592593 0.96296296 0.98113208 0.98039216
0.94117647 0.94117647 0.92307692 1. ]
mean value: 0.9598513522486886
key: train_fscore
value: [0.9957265 0.99357602 0.99145299 0.9957265 0.9978678 0.99363057
0.99787686 1. 0.99357602 0.9957265 ]
mean value: 0.9955159747729551
key: test_precision
value: [0.96153846 1. 0.89285714 0.92857143 0.96296296 1.
0.96 0.96 0.92307692 1. ]
mean value: 0.9589006919006919
key: train_precision
value: [1. 0.99570815 0.99570815 1. 1. 0.99152542
0.99576271 1. 1. 1. ]
mean value: 0.9978704444606096
key: test_recall
value: [0.96153846 0.96296296 0.96153846 1. 1. 0.96153846
0.92307692 0.92307692 0.92307692 1. ]
mean value: 0.9616809116809117
key: train_recall
value: [0.99148936 0.99145299 0.98723404 0.99148936 0.99574468 0.99574468
1. 1. 0.98723404 0.99148936]
mean value: 0.9931878523367885
key: test_roc_auc
value: [0.96225071 0.98148148 0.92307692 0.96153846 0.98076923 0.98076923
0.94230769 0.94230769 0.92307692 1. ]
mean value: 0.9597578347578348
key: train_roc_auc
value: [0.99574468 0.99359884 0.99148936 0.99574468 0.99787234 0.99361702
0.99787234 1. 0.99361702 0.99574468]
mean value: 0.9955300963811602
key: test_jcc
value: [0.92592593 0.96296296 0.86206897 0.92857143 0.96296296 0.96153846
0.88888889 0.88888889 0.85714286 1. ]
mean value: 0.9238951342399618
key: train_jcc
value: [0.99148936 0.98723404 0.98305085 0.99148936 0.99574468 0.98734177
0.99576271 1. 0.98723404 0.99148936]
mean value: 0.9910836182537762
MCC on Blind test: 0.9
Accuracy on Blind test: 0.95
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.14152217 0.1800046 0.18905711 0.15365958 0.15383959 0.17124987
0.15433812 0.15641809 0.15753102 0.1591301 ]
mean value: 0.16167502403259276
key: score_time
value: [0.02448511 0.02515721 0.02925134 0.0243063 0.02417588 0.02826023
0.02418399 0.02447724 0.02408624 0.02411389]
mean value: 0.025249743461608888
key: test_mcc
value: [0.82552431 0.66524218 0.3086067 0.65433031 0.73568294 0.70064905
0.76923077 0.69230769 0.76923077 0.54006172]
mean value: 0.6660866435725556
key: train_mcc
value: [0.99150739 0.99150708 0.9873145 0.98312115 0.9873145 0.99152527
0.9873145 0.9873145 0.9873145 0.9873145 ]
mean value: 0.9881547880923972
key: test_accuracy
value: [0.90566038 0.83018868 0.65384615 0.82692308 0.86538462 0.84615385
0.88461538 0.84615385 0.88461538 0.76923077]
mean value: 0.831277213352685
key: train_accuracy
value: [0.99573561 0.99573561 0.99361702 0.99148936 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940407385564578
key: test_fscore
value: [0.89361702 0.82352941 0.66666667 0.83018868 0.87272727 0.83333333
0.88461538 0.84615385 0.88461538 0.76 ]
mean value: 0.8295447000398473
key: train_fscore
value: [0.9957265 0.99570815 0.99357602 0.99141631 0.99357602 0.9957265
0.99357602 0.99357602 0.99357602 0.99357602]
mean value: 0.9940033557756031
key: test_precision
value: [1. 0.875 0.64285714 0.81481481 0.82758621 0.90909091
0.88461538 0.84615385 0.88461538 0.79166667]
mean value: 0.84764003557107
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.80769231 0.77777778 0.69230769 0.84615385 0.92307692 0.76923077
0.88461538 0.84615385 0.88461538 0.73076923]
mean value: 0.8162393162393162
key: train_recall
value: [0.99148936 0.99145299 0.98723404 0.98297872 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
key: test_roc_auc
value: [0.90384615 0.83119658 0.65384615 0.82692308 0.86538462 0.84615385
0.88461538 0.84615385 0.88461538 0.76923077]
mean value: 0.8311965811965811
key: train_roc_auc
value: [0.99574468 0.9957265 0.99361702 0.99148936 0.99361702 0.99574468
0.99361702 0.99361702 0.99361702 0.99361702]
mean value: 0.9940407346790325
key: test_jcc
value: [0.80769231 0.7 0.5 0.70967742 0.77419355 0.71428571
0.79310345 0.73333333 0.79310345 0.61290323]
mean value: 0.7138292445411467
key: train_jcc
value: [0.99148936 0.99145299 0.98723404 0.98297872 0.98723404 0.99148936
0.98723404 0.98723404 0.98723404 0.98723404]
mean value: 0.9880814693580651
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.73881483 0.72350812 0.73274493 0.72523093 0.73023582 0.74169731
0.7289629 0.72415876 0.72671056 0.72561407]
mean value: 0.7297678232192993
key: score_time
value: [0.00964332 0.00934935 0.00935745 0.00994277 0.00966001 0.00946784
0.00924683 0.00931072 0.00933957 0.00921845]
mean value: 0.009453630447387696
key: test_mcc
value: [0.85164138 1. 0.81312325 0.9258201 0.96225045 0.96225045
0.88527041 0.92307692 0.9258201 1. ]
mean value: 0.9249253060070406
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9245283 1. 0.90384615 0.96153846 0.98076923 0.98076923
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.9616835994194485
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92592593 1. 0.90909091 0.96296296 0.98113208 0.98039216
0.94117647 0.96153846 0.96296296 1. ]
mean value: 0.9625181925403901
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89285714 1. 0.86206897 0.92857143 0.96296296 1.
0.96 0.96153846 0.92857143 1. ]
mean value: 0.9496570390018666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96153846 1. 0.96153846 1. 1. 0.96153846
0.92307692 0.96153846 1. 1. ]
mean value: 0.9769230769230769
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92521368 1. 0.90384615 0.96153846 0.98076923 0.98076923
0.94230769 0.96153846 0.96153846 1. ]
mean value: 0.9617521367521368
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86206897 1. 0.83333333 0.92857143 0.96296296 0.96153846
0.88888889 0.92592593 0.92857143 1. ]
mean value: 0.9291861395309671
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.93
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03048921 0.02674842 0.03247237 0.03025341 0.03184032 0.04581976
0.09403157 0.06438708 0.03938842 0.05673742]
mean value: 0.04521679878234863
key: score_time
value: [0.01284647 0.01716471 0.02299953 0.01456118 0.03505945 0.02486992
0.02550769 0.02261448 0.01530409 0.01726437]
mean value: 0.02081918716430664
key: test_mcc
value: [0.29676375 0.3960114 0.50951017 0.34684399 0.45095603 0.6172134
0.27386128 0.36896403 0.4233902 0.13323468]
mean value: 0.3816748916887629
key: train_mcc
value: [0.80521616 0.96592046 0.96609741 0.68800744 0.84577093 0.97880317
0.74239822 0.59537119 0.89871703 0.63481105]
mean value: 0.8121113061220316
key: test_accuracy
value: [0.64150943 0.69811321 0.75 0.65384615 0.71153846 0.80769231
0.61538462 0.67307692 0.71153846 0.55769231]
mean value: 0.6820391872278665
key: train_accuracy
value: [0.89339019 0.98294243 0.98297872 0.8212766 0.91702128 0.9893617
0.85531915 0.76170213 0.94680851 0.78723404]
mean value: 0.8938034750260854
key: test_fscore
value: [0.6779661 0.7037037 0.77192982 0.71875 0.75409836 0.8
0.6969697 0.72131148 0.71698113 0.64615385]
mean value: 0.7207864141224611
key: train_fscore
value: [0.90384615 0.98297872 0.98312236 0.84837545 0.92337917 0.98942918
0.87360595 0.80756014 0.94382022 0.8245614 ]
mean value: 0.9080678755351793
key: test_precision
value: [0.60606061 0.7037037 0.70967742 0.60526316 0.65714286 0.83333333
0.575 0.62857143 0.7037037 0.53846154]
mean value: 0.6560917748226747
key: train_precision
value: [0.8245614 0.97881356 0.9748954 0.73667712 0.85766423 0.98319328
0.77557756 0.67723343 1. 0.70149254]
mean value: 0.8510108511659394
key: test_recall
value: [0.76923077 0.7037037 0.84615385 0.88461538 0.88461538 0.76923077
0.88461538 0.84615385 0.73076923 0.80769231]
mean value: 0.8126780626780626
key: train_recall
value: [1. 0.98717949 0.99148936 1. 1. 0.99574468
1. 1. 0.89361702 1. ]
mean value: 0.9868030551009275
key: test_roc_auc
value: [0.64387464 0.6980057 0.75 0.65384615 0.71153846 0.80769231
0.61538462 0.67307692 0.71153846 0.55769231]
mean value: 0.6822649572649573
key: train_roc_auc
value: [0.89316239 0.98295145 0.98297872 0.8212766 0.91702128 0.9893617
0.85531915 0.76170213 0.94680851 0.78723404]
mean value: 0.8937815966539371
key: test_jcc
value: [0.51282051 0.54285714 0.62857143 0.56097561 0.60526316 0.66666667
0.53488372 0.56410256 0.55882353 0.47727273]
mean value: 0.5652237060283873
key: train_jcc
value: [0.8245614 0.9665272 0.96680498 0.73667712 0.85766423 0.9790795
0.77557756 0.67723343 0.89361702 0.70149254]
mean value: 0.8379234972627273
MCC on Blind test: 0.41
Accuracy on Blind test: 0.67
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02924728 0.04173183 0.03825188 0.03950167 0.05261588 0.02752686
0.03700113 0.03158307 0.03224421 0.03502917]
mean value: 0.03647329807281494
key: score_time
value: [0.02261496 0.01868033 0.01876545 0.0188148 0.02677727 0.0213027
0.01892281 0.01888084 0.0189209 0.01876068]
mean value: 0.020244073867797852
key: test_mcc
value: [0.92704716 0.88746439 0.61538462 0.88527041 0.79056942 0.73568294
0.80829038 0.84615385 0.74466871 0.84866842]
mean value: 0.8089200292799028
key: train_mcc
value: [0.86799458 0.8681985 0.85559807 0.85559807 0.85113319 0.86433077
0.88136192 0.85559807 0.85958225 0.85113319]
mean value: 0.8610528586884763
key: test_accuracy
value: [0.96226415 0.94339623 0.80769231 0.94230769 0.88461538 0.86538462
0.90384615 0.92307692 0.86538462 0.92307692]
mean value: 0.9021044992743106
key: train_accuracy
value: [0.93390192 0.93390192 0.92765957 0.92765957 0.92553191 0.93191489
0.94042553 0.92765957 0.92978723 0.92553191]
mean value: 0.9303974050719049
key: test_fscore
value: [0.96 0.94339623 0.80769231 0.94339623 0.89655172 0.85714286
0.90196078 0.92307692 0.87719298 0.92592593]
mean value: 0.9036335957575999
key: train_fscore
value: [0.93473684 0.93473684 0.92857143 0.92857143 0.92600423 0.93305439
0.94142259 0.92857143 0.92993631 0.92600423]
mean value: 0.9311609719764614
key: test_precision
value: [1. 0.96153846 0.80769231 0.92592593 0.8125 0.91304348
0.92 0.92307692 0.80645161 0.89285714]
mean value: 0.8963085852254856
key: train_precision
value: [0.925 0.92116183 0.91701245 0.91701245 0.92016807 0.91769547
0.92592593 0.91701245 0.9279661 0.92016807]
mean value: 0.9209122805450133
key: test_recall
value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.80769231
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.9156695156695157
key: train_recall
value: [0.94468085 0.94871795 0.94042553 0.94042553 0.93191489 0.94893617
0.95744681 0.94042553 0.93191489 0.93191489]
mean value: 0.9416803055100927
key: test_roc_auc
value: [0.96153846 0.94373219 0.80769231 0.94230769 0.88461538 0.86538462
0.90384615 0.92307692 0.86538462 0.92307692]
mean value: 0.9020655270655271
key: train_roc_auc
value: [0.93387889 0.93393344 0.92765957 0.92765957 0.92553191 0.93191489
0.94042553 0.92765957 0.92978723 0.92553191]
mean value: 0.9303982542280415
key: test_jcc
value: [0.92307692 0.89285714 0.67741935 0.89285714 0.8125 0.75
0.82142857 0.85714286 0.78125 0.86206897]
mean value: 0.8270600957718588
key: train_jcc
value: [0.87747036 0.87747036 0.86666667 0.86666667 0.86220472 0.8745098
0.88932806 0.86666667 0.86904762 0.86220472]
mean value: 0.8712235646491643
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.2604568 0.26721501 0.30715322 0.33151817 0.27433705 0.28127027
0.27576518 0.27336073 0.27455401 0.30313301]
mean value: 0.28487634658813477
key: score_time
value: [0.02250957 0.01868081 0.02003098 0.01883531 0.01878572 0.01876616
0.01875806 0.01878333 0.01887512 0.01968741]
mean value: 0.01937124729156494
key: test_mcc
value: [0.92704716 0.88746439 0.61538462 0.88527041 0.79056942 0.73568294
0.80829038 0.84615385 0.74466871 0.84866842]
mean value: 0.8089200292799028
key: train_mcc
value: [0.86799458 0.8681985 0.85559807 0.85559807 0.85113319 0.86433077
0.88136192 0.85559807 0.85958225 0.85113319]
mean value: 0.8610528586884763
key: test_accuracy
value: [0.96226415 0.94339623 0.80769231 0.94230769 0.88461538 0.86538462
0.90384615 0.92307692 0.86538462 0.92307692]
mean value: 0.9021044992743106
key: train_accuracy
value: [0.93390192 0.93390192 0.92765957 0.92765957 0.92553191 0.93191489
0.94042553 0.92765957 0.92978723 0.92553191]
mean value: 0.9303974050719049
key: test_fscore
value: [0.96 0.94339623 0.80769231 0.94339623 0.89655172 0.85714286
0.90196078 0.92307692 0.87719298 0.92592593]
mean value: 0.9036335957575999
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93473684 0.93473684 0.92857143 0.92857143 0.92600423 0.93305439
0.94142259 0.92857143 0.92993631 0.92600423]
mean value: 0.9311609719764614
key: test_precision
value: [1. 0.96153846 0.80769231 0.92592593 0.8125 0.91304348
0.92 0.92307692 0.80645161 0.89285714]
mean value: 0.8963085852254856
key: train_precision
value: [0.925 0.92116183 0.91701245 0.91701245 0.92016807 0.91769547
0.92592593 0.91701245 0.9279661 0.92016807]
mean value: 0.9209122805450133
key: test_recall
value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.80769231
0.88461538 0.92307692 0.96153846 0.96153846]
mean value: 0.9156695156695157
key: train_recall
value: [0.94468085 0.94871795 0.94042553 0.94042553 0.93191489 0.94893617
0.95744681 0.94042553 0.93191489 0.93191489]
mean value: 0.9416803055100927
key: test_roc_auc
value: [0.96153846 0.94373219 0.80769231 0.94230769 0.88461538 0.86538462
0.90384615 0.92307692 0.86538462 0.92307692]
mean value: 0.9020655270655271
key: train_roc_auc
value: [0.93387889 0.93393344 0.92765957 0.92765957 0.92553191 0.93191489
0.94042553 0.92765957 0.92978723 0.92553191]
mean value: 0.9303982542280415
key: test_jcc
value: [0.92307692 0.89285714 0.67741935 0.89285714 0.8125 0.75
0.82142857 0.85714286 0.78125 0.86206897]
mean value: 0.8270600957718588
key: train_jcc
value: [0.87747036 0.87747036 0.86666667 0.86666667 0.86220472 0.8745098
0.88932806 0.86666667 0.86904762 0.86220472]
mean value: 0.8712235646491643
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88