LSHTM_analysis/scripts/ml/log_rpob_8020.txt

19319 lines
938 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 176
-------------------------------------------------------------
Successfully split data with stratification: 80/20
Train data size: (445, 176)
Test data size: (112, 176)
y_train numbers: Counter({0: 225, 1: 220})
y_train ratio: 1.0227272727272727
y_test_numbers: Counter({0: 57, 1: 55})
y_test ratio: 1.0363636363636364
-------------------------------------------------------------
Simple Random OverSampling
Counter({1: 225, 0: 225})
(450, 176)
Simple Random UnderSampling
Counter({0: 220, 1: 220})
(440, 176)
Simple Combined Over and UnderSampling
Counter({0: 225, 1: 225})
(450, 176)
SMOTE_NC OverSampling
Counter({1: 225, 0: 225})
(450, 176)
#####################################################################
Running ML analysis: 80/20 split
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_8020/
Sanity checks:
ML source data size: (557, 176)
Total input features: (445, 176)
Target feature numbers: Counter({0: 225, 1: 220})
Target features ratio: 1.0227272727272727
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.08822989 0.09819555 0.1139729 0.10658669 0.11813283 0.05520678
0.07728314 0.09269142 0.11131549 0.06736732]
mean value: 0.09289820194244384
key: score_time
value: [0.01899791 0.02099395 0.02197051 0.02467132 0.05310249 0.02295399
0.02127385 0.02181339 0.0188055 0.01463914]
mean value: 0.023922204971313477
key: test_mcc
value: [0.82506438 0.86732843 0.68911026 0.8360602 0.86758893 0.86452993
0.86452993 0.77352678 0.77352678 0.77352678]
mean value: 0.8134792424092705
key: train_mcc
value: [0.860043 0.85528899 0.8500425 0.869987 0.85018502 0.86053339
0.85041172 0.85535874 0.8705095 0.87541359]
mean value: 0.8597773460103459
key: test_accuracy
value: [0.91111111 0.93333333 0.84444444 0.91111111 0.93333333 0.93181818
0.93181818 0.88636364 0.88636364 0.88636364]
mean value: 0.9056060606060606
key: train_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93 0.9275 0.925 0.935 0.925 0.93017456
0.92518703 0.9276808 0.93516209 0.93765586]
mean value: 0.9298360349127183
key: test_fscore
value: [0.9047619 0.93023256 0.8372093 0.91666667 0.93333333 0.93023256
0.93333333 0.88888889 0.88888889 0.88888889]
mean value: 0.9052436323366556
key: train_fscore
value: [0.92964824 0.9276808 0.92462312 0.93434343 0.925 0.93
0.92462312 0.92695214 0.935 0.93734336]
mean value: 0.9295214204164155
key: test_precision
value: [0.95 0.95238095 0.85714286 0.84615385 0.91304348 0.95238095
0.91304348 0.86956522 0.86956522 0.86956522]
mean value: 0.8992841216754259
key: train_precision
value: [0.925 0.91625616 0.92 0.93434343 0.91584158 0.92079208
0.92 0.92462312 0.92574257 0.93034826]
mean value: 0.9232947203887022
key: test_recall
value: [0.86363636 0.90909091 0.81818182 1. 0.95454545 0.90909091
0.95454545 0.90909091 0.90909091 0.90909091]
mean value: 0.9136363636363636
key: train_recall
value: [0.93434343 0.93939394 0.92929293 0.93434343 0.93434343 0.93939394
0.92929293 0.92929293 0.94444444 0.94444444]
mean value: 0.9358585858585858
key: test_roc_auc
value: [0.91007905 0.93280632 0.84387352 0.91304348 0.93379447 0.93181818
0.93181818 0.88636364 0.88636364 0.88636364]
mean value: 0.9056324110671937
key: train_roc_auc
value: [0.930043 0.92761776 0.9250425 0.9349935 0.92509251 0.9302881
0.9252376 0.92770065 0.93527641 0.93773946]
mean value: 0.9299031504135635
key: test_jcc
value: [0.82608696 0.86956522 0.72 0.84615385 0.875 0.86956522
0.875 0.8 0.8 0.8 ]
mean value: 0.8281371237458194
key: train_jcc
value: [0.8685446 0.86511628 0.85981308 0.87677725 0.86046512 0.86915888
0.85981308 0.86384977 0.87793427 0.88207547]
mean value: 0.8683547803458409
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.07467175 1.19143915 3.18981433 1.21566057 1.01042342 0.9827199
1.26141262 2.24912834 3.37447715 2.48636866]
mean value: 1.8036115884780883
key: score_time
value: [0.01486397 0.01910448 0.01350474 0.01525831 0.02177715 0.02520943
0.01491761 0.02106428 0.03649759 0.02547741]
mean value: 0.02076749801635742
key: test_mcc
value: [0.82506438 0.82506438 0.64613475 0.79854941 0.91485328 0.81818182
0.86452993 0.81818182 0.81818182 0.77352678]
mean value: 0.8102268367108156
key: train_mcc
value: [0.89018902 0.900045 0.83510219 0.89002252 0.88510532 0.89536533
0.90029107 0.89025725 0.89555655 0.90043786]
mean value: 0.8882372108052222
key: test_accuracy
value: [0.91111111 0.91111111 0.82222222 0.88888889 0.95555556 0.90909091
0.93181818 0.90909091 0.90909091 0.88636364]
mean value: 0.9034343434343434
key: train_accuracy
value: [0.945 0.95 0.9175 0.945 0.9425 0.94763092
0.95012469 0.94513716 0.94763092 0.95012469]
mean value: 0.9440648379052369
key: test_fscore
value: [0.9047619 0.9047619 0.80952381 0.89795918 0.95652174 0.90909091
0.93333333 0.90909091 0.90909091 0.88888889]
mean value: 0.9023023491346472
key: train_fscore
value: [0.945 0.94974874 0.91729323 0.94416244 0.94235589 0.94736842
0.94974874 0.94444444 0.94763092 0.95 ]
mean value: 0.943775283498277
key: test_precision
value: [0.95 0.95 0.85 0.81481481 0.91666667 0.90909091
0.91304348 0.90909091 0.90909091 0.86956522]
mean value: 0.8991362904406383
key: train_precision
value: [0.93564356 0.945 0.91044776 0.94897959 0.93532338 0.94029851
0.945 0.94444444 0.93596059 0.94059406]
mean value: 0.9381691902917854
key: test_recall
value: [0.86363636 0.86363636 0.77272727 1. 1. 0.90909091
0.95454545 0.90909091 0.90909091 0.90909091]
mean value: 0.9090909090909091
key: train_recall
value: [0.95454545 0.95454545 0.92424242 0.93939394 0.94949495 0.95454545
0.95454545 0.94444444 0.95959596 0.95959596]
mean value: 0.9494949494949495
key: test_roc_auc
value: [0.91007905 0.91007905 0.82114625 0.89130435 0.95652174 0.90909091
0.93181818 0.90909091 0.90909091 0.88636364]
mean value: 0.9034584980237155
key: train_roc_auc
value: [0.94509451 0.950045 0.91756676 0.94494449 0.94256926 0.94771608
0.95017913 0.94512863 0.94777828 0.95024133]
mean value: 0.9441263461321502
key: test_jcc
value: [0.82608696 0.82608696 0.68 0.81481481 0.91666667 0.83333333
0.875 0.83333333 0.83333333 0.8 ]
mean value: 0.823865539452496
key: train_jcc
value: [0.8957346 0.90430622 0.84722222 0.89423077 0.89099526 0.9
0.90430622 0.89473684 0.90047393 0.9047619 ]
mean value: 0.8936767969980741
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.02495861 0.01450539 0.01457644 0.01471186 0.01506543 0.01446843
0.01499057 0.01478457 0.01477289 0.01486278]
mean value: 0.015769696235656737
key: score_time
value: [0.01284051 0.01264167 0.01298308 0.012573 0.01277876 0.01334071
0.01299834 0.0127492 0.01270795 0.0127039 ]
mean value: 0.012831711769104004
key: test_mcc
value: [0.56261436 0.66660455 0.74410286 0.51089209 0.60079051 0.72727273
0.66143783 0.68252363 0.43151697 0.68252363]
mean value: 0.6270279162046954
key: train_mcc
value: [0.68538393 0.67445688 0.70858632 0.68778613 0.64887146 0.64847406
0.66903696 0.68058469 0.68521411 0.6862916 ]
mean value: 0.6774686130760773
key: test_accuracy
value: [0.77777778 0.82222222 0.86666667 0.75555556 0.8 0.86363636
0.81818182 0.84090909 0.70454545 0.84090909]
mean value: 0.809040404040404
key: train_accuracy
value: [0.84 0.835 0.8525 0.8425 0.8225 0.82044888
0.83291771 0.83790524 0.840399 0.84289277]
mean value: 0.8367063591022443
key: test_fscore
value: [0.75 0.78947368 0.85 0.74418605 0.8 0.86363636
0.78947368 0.8372093 0.64864865 0.84444444]
mean value: 0.7917072173987718
key: train_fscore
value: [0.82702703 0.82258065 0.84266667 0.83289125 0.80965147 0.8021978
0.82133333 0.82479784 0.82795699 0.84367246]
mean value: 0.8254775485090063
key: test_precision
value: [0.83333333 0.9375 0.94444444 0.76190476 0.7826087 0.86363636
0.9375 0.85714286 0.8 0.82608696]
mean value: 0.8544157412635673
key: train_precision
value: [0.88953488 0.87931034 0.89265537 0.87709497 0.86285714 0.87951807
0.8700565 0.88439306 0.88505747 0.82926829]
mean value: 0.8749746107699744
key: test_recall
value: [0.68181818 0.68181818 0.77272727 0.72727273 0.81818182 0.86363636
0.68181818 0.81818182 0.54545455 0.86363636]
mean value: 0.7454545454545455
key: train_recall
value: [0.77272727 0.77272727 0.7979798 0.79292929 0.76262626 0.73737374
0.77777778 0.77272727 0.77777778 0.85858586]
mean value: 0.7823232323232323
key: test_roc_auc
value: [0.7756917 0.81916996 0.86462451 0.75494071 0.80039526 0.86363636
0.81818182 0.84090909 0.70454545 0.84090909]
mean value: 0.808300395256917
key: train_roc_auc
value: [0.83933393 0.83438344 0.8519602 0.8420092 0.82190719 0.81942578
0.83223864 0.83710255 0.83962781 0.84308603]
mean value: 0.8361074777428482
key: test_jcc
value: [0.6 0.65217391 0.73913043 0.59259259 0.66666667 0.76
0.65217391 0.72 0.48 0.73076923]
mean value: 0.6593506750898055
key: train_jcc
value: [0.70506912 0.69863014 0.7281106 0.71363636 0.68018018 0.66972477
0.69683258 0.70183486 0.70642202 0.72961373]
mean value: 0.7030054368772396
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01549268 0.01472116 0.01465893 0.0147357 0.01448822 0.01551938
0.01875544 0.02495813 0.01451755 0.01468372]
mean value: 0.016253089904785155
key: score_time
value: [0.01308107 0.01281691 0.01316237 0.01281404 0.01322484 0.01289582
0.02950215 0.01286101 0.01288104 0.01303196]
mean value: 0.01462712287902832
key: test_mcc
value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364
0.73029674 0.63636364 0.5547002 0.77352678]
mean value: 0.6751445052594404
key: train_mcc
value: [0.73006509 0.73513714 0.7700385 0.74497106 0.74497106 0.71074778
0.7306343 0.75588396 0.69693637 0.75588396]
mean value: 0.7375269233005722
key: test_accuracy
value: [0.84444444 0.88888889 0.82222222 0.8 0.84444444 0.81818182
0.86363636 0.81818182 0.77272727 0.88636364]
mean value: 0.8359090909090909
key: train_accuracy
value: [0.865 0.8675 0.885 0.8725 0.8725 0.8553616
0.86533666 0.87780549 0.8478803 0.87780549]
mean value: 0.8686689526184539
key: test_fscore
value: [0.8372093 0.88372093 0.81818182 0.8 0.85714286 0.81818182
0.85714286 0.81818182 0.75 0.88888889]
mean value: 0.8328650290278198
key: train_fscore
value: [0.8622449 0.86445013 0.88442211 0.87088608 0.87088608 0.85204082
0.86294416 0.87780549 0.84073107 0.87780549]
mean value: 0.866421631011566
key: test_precision
value: [0.85714286 0.9047619 0.81818182 0.7826087 0.77777778 0.81818182
0.9 0.81818182 0.83333333 0.86956522]
mean value: 0.8379735240604806
key: train_precision
value: [0.87113402 0.87564767 0.88 0.87309645 0.87309645 0.86082474
0.86734694 0.86699507 0.87027027 0.86699507]
mean value: 0.8705406681510427
key: test_recall
value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182
0.81818182 0.81818182 0.68181818 0.90909091]
mean value: 0.8318181818181818
key: train_recall
value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434
0.85858586 0.88888889 0.81313131 0.88888889]
mean value: 0.8626262626262626
key: test_roc_auc
value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182
0.86363636 0.81818182 0.77272727 0.88636364]
mean value: 0.8360671936758893
key: train_roc_auc
value: [0.86488649 0.86736174 0.8850385 0.87246225 0.87246225 0.85521471
0.86525352 0.87794198 0.84745236 0.87794198]
mean value: 0.8686015769064591
key: test_jcc
value: [0.72 0.79166667 0.69230769 0.66666667 0.75 0.69230769
0.75 0.69230769 0.6 0.8 ]
mean value: 0.715525641025641
key: train_jcc
value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222
0.75892857 0.78222222 0.72522523 0.78222222]
mean value: 0.7645322947867791
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01377344 0.0139215 0.01400995 0.02145243 0.01363587 0.01715064
0.01344013 0.01378942 0.03159833 0.03337765]
mean value: 0.018614935874938964
key: score_time
value: [0.09378958 0.03972101 0.03561568 0.05780077 0.05395031 0.0524869
0.03577709 0.05378652 0.04946804 0.04471517]
mean value: 0.051711106300354005
key: test_mcc
value: [0.48086334 0.42403053 0.29512214 0.19960474 0.55666994 0.64715023
0.59648091 0.59648091 0.50051733 0.59648091]
mean value: 0.4893400981241138
key: train_mcc
value: [0.68527843 0.70019536 0.70019536 0.70627441 0.68496131 0.7009041
0.66084236 0.66734561 0.71074778 0.69173625]
mean value: 0.690848097079054
key: test_accuracy
value: [0.73333333 0.71111111 0.64444444 0.6 0.77777778 0.81818182
0.79545455 0.79545455 0.75 0.79545455]
mean value: 0.7421212121212121
key: train_accuracy
value: [0.8425 0.85 0.85 0.8525 0.8425 0.85037406
0.83042394 0.83291771 0.8553616 0.84538653]
mean value: 0.8451963840399003
key: test_fscore
value: [0.68421053 0.68292683 0.57894737 0.59090909 0.76190476 0.83333333
0.7804878 0.7804878 0.74418605 0.80851064]
mean value: 0.7245904204717919
key: train_fscore
value: [0.83804627 0.84615385 0.84615385 0.845953 0.84050633 0.84615385
0.82653061 0.82414698 0.85204082 0.83854167]
mean value: 0.8404227219545394
key: test_precision
value: [0.8125 0.73684211 0.6875 0.59090909 0.8 0.76923077
0.84210526 0.84210526 0.76190476 0.76 ]
mean value: 0.760309725362357
key: train_precision
value: [0.85340314 0.859375 0.859375 0.87567568 0.84263959 0.859375
0.83505155 0.8579235 0.86082474 0.8655914 ]
mean value: 0.8569234594722578
key: test_recall
value: [0.59090909 0.63636364 0.5 0.59090909 0.72727273 0.90909091
0.72727273 0.72727273 0.72727273 0.86363636]
mean value: 0.7
key: train_recall
value: [0.82323232 0.83333333 0.83333333 0.81818182 0.83838384 0.83333333
0.81818182 0.79292929 0.84343434 0.81313131]
mean value: 0.8247474747474748
key: test_roc_auc
value: [0.73023715 0.70948617 0.64130435 0.59980237 0.77667984 0.81818182
0.79545455 0.79545455 0.75 0.79545455]
mean value: 0.7412055335968379
key: train_roc_auc
value: [0.84230923 0.84983498 0.84983498 0.85216022 0.84245925 0.8501642
0.83027318 0.83242524 0.85521471 0.8449893 ]
mean value: 0.8449665286725717
key: test_jcc
value: [0.52 0.51851852 0.40740741 0.41935484 0.61538462 0.71428571
0.64 0.64 0.59259259 0.67857143]
mean value: 0.5746115115469954
key: train_jcc
value: [0.72123894 0.73333333 0.73333333 0.73303167 0.72489083 0.73333333
0.70434783 0.70089286 0.74222222 0.72197309]
mean value: 0.7248597441578004
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02649641 0.02651262 0.02658916 0.02641249 0.0266242 0.02697945
0.02693439 0.02807307 0.02710104 0.0273788 ]
mean value: 0.026910161972045897
key: score_time
value: [0.01609492 0.01617599 0.01605654 0.0158565 0.03611541 0.01599598
0.01581216 0.0163033 0.01630783 0.01622319]
mean value: 0.018094182014465332
key: test_mcc
value: [0.78405645 0.82506438 0.60000118 0.74605372 0.82574419 0.86452993
0.90909091 0.72727273 0.81818182 0.77352678]
mean value: 0.7873522091161106
key: train_mcc
value: [0.80528086 0.80528086 0.81500094 0.809981 0.79998 0.79560664
0.79055651 0.80547816 0.80053238 0.81050825]
mean value: 0.8038205592014417
key: test_accuracy
value: [0.88888889 0.91111111 0.8 0.86666667 0.91111111 0.93181818
0.95454545 0.86363636 0.90909091 0.88636364]
mean value: 0.8923232323232323
key: train_accuracy
value: [0.9025 0.9025 0.9075 0.905 0.9 0.89775561
0.89526185 0.90274314 0.90024938 0.90523691]
mean value: 0.9018746882793017
key: test_fscore
value: [0.87804878 0.9047619 0.79069767 0.875 0.91304348 0.93023256
0.95454545 0.86363636 0.90909091 0.88888889]
mean value: 0.8907946012230334
key: train_fscore
value: [0.90274314 0.90274314 0.90680101 0.9040404 0.8989899 0.89724311
0.89447236 0.90176322 0.89949749 0.90452261]
mean value: 0.9012816389138596
key: test_precision
value: [0.94736842 0.95 0.80952381 0.80769231 0.875 0.95238095
0.95454545 0.86363636 0.90909091 0.86956522]
mean value: 0.8938803435313732
key: train_precision
value: [0.89162562 0.89162562 0.90452261 0.9040404 0.8989899 0.89054726
0.89 0.89949749 0.895 0.9 ]
mean value: 0.8965848898741502
key: test_recall
value: [0.81818182 0.86363636 0.77272727 0.95454545 0.95454545 0.90909091
0.95454545 0.86363636 0.90909091 0.90909091]
mean value: 0.8909090909090909
key: train_recall
value: [0.91414141 0.91414141 0.90909091 0.9040404 0.8989899 0.9040404
0.8989899 0.9040404 0.9040404 0.90909091]
mean value: 0.9060606060606061
key: test_roc_auc
value: [0.88735178 0.91007905 0.79940711 0.86857708 0.91205534 0.93181818
0.95454545 0.86363636 0.90909091 0.88636364]
mean value: 0.8922924901185771
key: train_roc_auc
value: [0.90261526 0.90261526 0.90751575 0.9049905 0.89999 0.89783301
0.89530776 0.90275912 0.90029606 0.90528437]
mean value: 0.9019207093123105
key: test_jcc
value: [0.7826087 0.82608696 0.65384615 0.77777778 0.84 0.86956522
0.91304348 0.76 0.83333333 0.8 ]
mean value: 0.8056261612783352
key: train_jcc
value: [0.82272727 0.82272727 0.82949309 0.82488479 0.81651376 0.81363636
0.80909091 0.82110092 0.8173516 0.82568807]
mean value: 0.8203214048833244
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.87617755 3.59716487 3.2788682 3.04321766 2.02100229 0.76258993
0.71084738 1.37337828 0.62539959 1.43995023]
mean value: 1.9728595972061158
key: score_time
value: [0.02402663 0.0251689 0.02707911 0.05417156 0.02970004 0.01262379
0.02017164 0.02028584 0.02023172 0.02038431]
mean value: 0.025384354591369628
key: test_mcc
value: [0.86732843 0.77821935 0.60404349 0.76206649 0.86758893 0.51031036
0.86452993 0.77352678 0.77352678 0.60678804]
mean value: 0.7407928602447988
key: train_mcc
value: [0.99501219 1. 0.99501219 1. 1. 0.63345212
0.79681808 0.81118415 0.78683326 0.76497588]
mean value: 0.8783287861680027
key: test_accuracy
value: [0.93333333 0.88888889 0.8 0.86666667 0.93333333 0.72727273
0.93181818 0.88636364 0.88636364 0.79545455]
mean value: 0.8649494949494949
key: train_accuracy
value: [0.9975 1. 0.9975 1. 1. 0.79301746
0.89775561 0.90523691 0.89276808 0.87531172]
mean value: 0.9359089775561098
key: test_fscore
value: [0.93023256 0.88372093 0.7804878 0.88 0.93333333 0.77777778
0.93333333 0.88888889 0.88888889 0.81632653]
mean value: 0.8712990046084609
key: train_fscore
value: [0.99748111 1. 0.99748111 1. 1. 0.82377919
0.8992629 0.90594059 0.89434889 0.88479263]
mean value: 0.940308642422994
key: test_precision
value: [0.95238095 0.9047619 0.84210526 0.78571429 0.91304348 0.65625
0.91304348 0.86956522 0.86956522 0.74074074]
mean value: 0.8447170538060126
key: train_precision
value: [0.99497487 1. 0.99497487 1. 1. 0.71062271
0.87559809 0.88834951 0.8708134 0.81355932]
mean value: 0.9148892779217023
key: test_recall
value: [0.90909091 0.86363636 0.72727273 1. 0.95454545 0.95454545
0.95454545 0.90909091 0.90909091 0.90909091]
mean value: 0.9090909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 0.97979798
0.92424242 0.92424242 0.91919192 0.96969697]
mean value: 0.9717171717171718
key: test_roc_auc
value: [0.93280632 0.88833992 0.79841897 0.86956522 0.93379447 0.72727273
0.93181818 0.88636364 0.88636364 0.79545455]
mean value: 0.8650197628458498
key: train_roc_auc
value: [0.99752475 1. 0.99752475 1. 1. 0.79531771
0.8980818 0.90547097 0.8930935 0.8764741 ]
mean value: 0.9363487580285123
key: test_jcc
value: [0.86956522 0.79166667 0.64 0.78571429 0.875 0.63636364
0.875 0.8 0.8 0.68965517]
mean value: 0.7762964978549687
key: train_jcc
value: [0.99497487 1. 0.99497487 1. 1. 0.70036101
0.81696429 0.8280543 0.80888889 0.79338843]
mean value: 0.8937606662571818
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04109359 0.02866888 0.02903247 0.02609515 0.02982569 0.026232
0.02595305 0.02627039 0.02766967 0.02752471]
mean value: 0.028836560249328614
key: score_time
value: [0.01269889 0.01240039 0.01267838 0.01267815 0.01238513 0.01267934
0.01254892 0.01264668 0.01244855 0.01284599]
mean value: 0.012601041793823242
key: test_mcc
value: [0.86732843 0.86758893 0.86732843 0.82574419 1. 0.73029674
0.86452993 0.87177979 0.82158384 0.81818182]
mean value: 0.8534362113672467
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.93333333 0.93333333 0.91111111 1. 0.86363636
0.93181818 0.93181818 0.90909091 0.90909091]
mean value: 0.9256565656565656
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93023256 0.93333333 0.93023256 0.91304348 1. 0.86956522
0.93333333 0.93617021 0.9047619 0.90909091]
mean value: 0.9259763505216682
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 0.91304348 0.95238095 0.875 1. 0.83333333
0.91304348 0.88 0.95 0.90909091]
mean value: 0.9178273103707886
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.95454545 0.90909091 0.95454545 1. 0.90909091
0.95454545 1. 0.86363636 0.90909091]
mean value: 0.9363636363636364
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93280632 0.93379447 0.93280632 0.91205534 1. 0.86363636
0.93181818 0.93181818 0.90909091 0.90909091]
mean value: 0.9256916996047431
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86956522 0.875 0.86956522 0.84 1. 0.76923077
0.875 0.88 0.82608696 0.83333333]
mean value: 0.863778149386845
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.19880676 0.16679668 0.24343967 0.16726899 0.29743075 0.16902733
0.17551208 0.1705997 0.18266988 0.17188907]
mean value: 0.19434409141540526
key: score_time
value: [0.02430034 0.02431679 0.0246942 0.02460122 0.02689743 0.02486563
0.02493906 0.02493906 0.02535796 0.02534413]
mean value: 0.025025582313537596
key: test_mcc
value: [0.86732843 0.82506438 0.60000118 0.69583743 0.78530224 0.86452993
0.7800135 0.7800135 0.77352678 0.77352678]
mean value: 0.7745144142569061
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.91111111 0.8 0.84444444 0.88888889 0.93181818
0.88636364 0.88636364 0.88636364 0.88636364]
mean value: 0.8855050505050505
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93023256 0.9047619 0.79069767 0.85106383 0.89361702 0.93333333
0.87804878 0.89361702 0.88372093 0.88888889]
mean value: 0.8847981942603055
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 0.95 0.80952381 0.8 0.84 0.91304348
0.94736842 0.84 0.9047619 0.86956522]
mean value: 0.8826643783371472
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.86363636 0.77272727 0.90909091 0.95454545 0.95454545
0.81818182 0.95454545 0.86363636 0.90909091]
mean value: 0.8909090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93280632 0.91007905 0.79940711 0.8458498 0.89031621 0.93181818
0.88636364 0.88636364 0.88636364 0.88636364]
mean value: 0.8855731225296443
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86956522 0.82608696 0.65384615 0.74074074 0.80769231 0.875
0.7826087 0.80769231 0.79166667 0.8 ]
mean value: 0.7954899046203394
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.84
Accuracy on Blind test: 0.92
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01551032 0.01521516 0.01527309 0.01511455 0.01513052 0.01546001
0.01533055 0.01527858 0.01553512 0.01524663]
mean value: 0.015309453010559082
key: score_time
value: [0.01284385 0.01274657 0.01290631 0.01287198 0.01276088 0.01284337
0.01283884 0.01324463 0.01275945 0.01283479]
mean value: 0.012865066528320312
key: test_mcc
value: [0.55666994 0.38112585 0.68972332 0.19881069 0.46930785 0.77352678
0.32673202 0.45454545 0.54545455 0.50051733]
mean value: 0.48964137935205826
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.77777778 0.68888889 0.84444444 0.6 0.73333333 0.88636364
0.65909091 0.72727273 0.77272727 0.75 ]
mean value: 0.743989898989899
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 0.65 0.84444444 0.57142857 0.73913043 0.88372093
0.61538462 0.72727273 0.77272727 0.74418605]
mean value: 0.7310199804689188
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.72222222 0.82608696 0.6 0.70833333 0.9047619
0.70588235 0.72727273 0.77272727 0.76190476]
mean value: 0.7529191531685138
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.59090909 0.86363636 0.54545455 0.77272727 0.86363636
0.54545455 0.72727273 0.77272727 0.72727273]
mean value: 0.7136363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77667984 0.68675889 0.84486166 0.59881423 0.73418972 0.88636364
0.65909091 0.72727273 0.77272727 0.75 ]
mean value: 0.7436758893280633
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 0.48148148 0.73076923 0.4 0.5862069 0.79166667
0.44444444 0.57142857 0.62962963 0.59259259]
mean value: 0.5843604128948956
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.48509145 1.72775126 1.72003531 1.70660305 1.7010572 1.70190072
1.67853093 1.70248485 1.73082352 1.76075959]
mean value: 1.7915037870407104
key: score_time
value: [0.10180235 0.10169816 0.09832668 0.10011482 0.10132575 0.09362698
0.09398794 0.09288573 0.10091448 0.09981847]
mean value: 0.09845013618469238
key: test_mcc
value: [0.95652174 0.91452919 0.91106719 0.86758893 1. 0.90909091
0.95553309 0.87177979 0.82158384 0.81818182]
mean value: 0.9025876492117845
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.93333333 1. 0.95454545
0.97727273 0.93181818 0.90909091 0.90909091]
mean value: 0.9504040404040404
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97777778 0.95238095 0.95454545 0.93333333 1. 0.95454545
0.97674419 0.93617021 0.9047619 0.90909091]
mean value: 0.9499350185248255
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95652174 1. 0.95454545 0.91304348 1. 0.95454545
1. 0.88 0.95 0.90909091]
mean value: 0.9517747035573123
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.95454545 0.95454545 1. 0.95454545
0.95454545 1. 0.86363636 0.90909091]
mean value: 0.95
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97826087 0.95454545 0.9555336 0.93379447 1. 0.95454545
0.97727273 0.93181818 0.90909091 0.90909091]
mean value: 0.9503952569169961
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95652174 0.90909091 0.91304348 0.875 1. 0.91304348
0.95454545 0.88 0.82608696 0.83333333]
mean value: 0.906066534914361
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [2.09929252 1.04539609 1.14330029 0.98038101 0.95856428 0.94347358
1.00293994 0.99390554 0.91637397 1.01709938]
mean value: 1.110072660446167
key: score_time
value: [0.2254622 0.14489126 0.16381502 0.14184189 0.19981694 0.18624687
0.16025591 0.16381979 0.13914704 0.18514252]
mean value: 0.1710439443588257
key: test_mcc
value: [0.91106719 0.91452919 0.86732843 0.73663511 0.91485328 0.90909091
0.91287093 0.87177979 0.82158384 0.81818182]
mean value: 0.8677920483015767
key: train_mcc
value: [0.95500519 0.949995 0.94500356 0.95017516 0.949995 0.95015803
0.95011693 0.94513697 0.94513697 0.96009355]
mean value: 0.9500816369848698
key: test_accuracy
value: [0.95555556 0.95555556 0.93333333 0.86666667 0.95555556 0.95454545
0.95454545 0.93181818 0.90909091 0.90909091]
mean value: 0.9325757575757576
key: train_accuracy
value: [0.9775 0.975 0.9725 0.975 0.975 0.97506234
0.97506234 0.97256858 0.97256858 0.98004988]
mean value: 0.9750311720698255
key: test_fscore
value: [0.95454545 0.95238095 0.93023256 0.86956522 0.95652174 0.95454545
0.95238095 0.93617021 0.9047619 0.90909091]
mean value: 0.9320195355132859
key: train_fscore
value: [0.97721519 0.97474747 0.9721519 0.9744898 0.97474747 0.97461929
0.97474747 0.9721519 0.9721519 0.97979798]
mean value: 0.9746820375374823
key: test_precision
value: [0.95454545 1. 0.95238095 0.83333333 0.91666667 0.95454545
1. 0.88 0.95 0.90909091]
mean value: 0.935056277056277
key: train_precision
value: [0.97969543 0.97474747 0.97461929 0.98453608 0.97474747 0.97959184
0.97474747 0.97461929 0.97461929 0.97979798]
mean value: 0.977172162274171
key: test_recall
value: [0.95454545 0.90909091 0.90909091 0.90909091 1. 0.95454545
0.90909091 1. 0.86363636 0.90909091]
mean value: 0.9318181818181818
key: train_recall
value: [0.97474747 0.97474747 0.96969697 0.96464646 0.97474747 0.96969697
0.97474747 0.96969697 0.96969697 0.97979798]
mean value: 0.9722222222222222
key: test_roc_auc
value: [0.9555336 0.95454545 0.93280632 0.86758893 0.95652174 0.95454545
0.95454545 0.93181818 0.90909091 0.90909091]
mean value: 0.932608695652174
key: train_roc_auc
value: [0.97747275 0.9749975 0.97247225 0.97489749 0.9749975 0.97499627
0.97505847 0.97253321 0.97253321 0.98004677]
mean value: 0.9750005419261139
key: test_jcc
value: [0.91304348 0.90909091 0.86956522 0.76923077 0.91666667 0.91304348
0.90909091 0.88 0.82608696 0.83333333]
mean value: 0.873915171784737
key: train_jcc
value: [0.95544554 0.95073892 0.94581281 0.95024876 0.95073892 0.95049505
0.95073892 0.94581281 0.94581281 0.96039604]
mean value: 0.9506240562296064
MCC on Blind test: 0.93
Accuracy on Blind test: 0.96
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01235938 0.01207018 0.01216507 0.01198816 0.01195812 0.01208854
0.01212764 0.01205707 0.01214314 0.01219463]
mean value: 0.012115192413330079
key: score_time
value: [0.01045775 0.01042986 0.0105226 0.01041794 0.01037836 0.01043344
0.01043797 0.01044178 0.01040506 0.01040316]
mean value: 0.010432791709899903
key: test_mcc
value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364
0.73029674 0.63636364 0.5547002 0.77352678]
mean value: 0.6751445052594404
key: train_mcc
value: [0.73006509 0.73513714 0.7700385 0.74497106 0.74497106 0.71074778
0.7306343 0.75588396 0.69693637 0.75588396]
mean value: 0.7375269233005722
key: test_accuracy
value: [0.84444444 0.88888889 0.82222222 0.8 0.84444444 0.81818182
0.86363636 0.81818182 0.77272727 0.88636364]
mean value: 0.8359090909090909
key: train_accuracy
value: [0.865 0.8675 0.885 0.8725 0.8725 0.8553616
0.86533666 0.87780549 0.8478803 0.87780549]
mean value: 0.8686689526184539
key: test_fscore
value: [0.8372093 0.88372093 0.81818182 0.8 0.85714286 0.81818182
0.85714286 0.81818182 0.75 0.88888889]
mean value: 0.8328650290278198
key: train_fscore
value: [0.8622449 0.86445013 0.88442211 0.87088608 0.87088608 0.85204082
0.86294416 0.87780549 0.84073107 0.87780549]
mean value: 0.866421631011566
key: test_precision
value: [0.85714286 0.9047619 0.81818182 0.7826087 0.77777778 0.81818182
0.9 0.81818182 0.83333333 0.86956522]
mean value: 0.8379735240604806
key: train_precision
value: [0.87113402 0.87564767 0.88 0.87309645 0.87309645 0.86082474
0.86734694 0.86699507 0.87027027 0.86699507]
mean value: 0.8705406681510427
key: test_recall
value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182
0.81818182 0.81818182 0.68181818 0.90909091]
mean value: 0.8318181818181818
key: train_recall
value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434
0.85858586 0.88888889 0.81313131 0.88888889]
mean value: 0.8626262626262626
key: test_roc_auc
value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182
0.86363636 0.81818182 0.77272727 0.88636364]
mean value: 0.8360671936758893
key: train_roc_auc
value: [0.86488649 0.86736174 0.8850385 0.87246225 0.87246225 0.85521471
0.86525352 0.87794198 0.84745236 0.87794198]
mean value: 0.8686015769064591
key: test_jcc
value: [0.72 0.79166667 0.69230769 0.66666667 0.75 0.69230769
0.75 0.69230769 0.6 0.8 ]
mean value: 0.715525641025641
key: train_jcc
value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222
0.75892857 0.78222222 0.72522523 0.78222222]
mean value: 0.7645322947867791
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.6125834 0.7919445 0.91690707 0.64126277 0.27586269 3.05022645
1.53905249 2.51379037 1.85096502 1.52625299]
mean value: 1.3718847751617431
key: score_time
value: [0.01470351 0.01476789 0.01252818 0.01451087 0.01576161 0.01224637
0.01332831 0.0270524 0.01303458 0.01383972]
mean value: 0.015177345275878907
key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.90909091
0.86452993 0.90909091 0.95553309 0.81818182]
mean value: 0.9149936062423393
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545
0.93181818 0.95454545 0.97727273 0.90909091]
mean value: 0.9571717171717172
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95454545
0.93023256 0.95454545 0.97674419 0.90909091]
mean value: 0.9568548988366986
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.95454545
0.95238095 0.95454545 1. 0.90909091]
mean value: 0.9552842085450781
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.95454545 0.95454545 1. 1. 0.95454545
0.90909091 0.95454545 0.95454545 0.90909091]
mean value: 0.9590909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545
0.93181818 0.95454545 0.97727273 0.90909091]
mean value: 0.9573122529644269
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91304348
0.86956522 0.91304348 0.95454545 0.83333333]
mean value: 0.9182806324110672
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05299616 0.07507491 0.09356999 0.1008358 0.09290576 0.09139204
0.07153034 0.07866263 0.10971498 0.08641815]
mean value: 0.08531007766723633
key: score_time
value: [0.02284002 0.02450013 0.01256418 0.02179718 0.02442145 0.02206469
0.02130103 0.02369428 0.0252614 0.02076888]
mean value: 0.021921324729919433
key: test_mcc
value: [0.68911026 0.73559956 0.73320158 0.82574419 0.78530224 0.81818182
0.63636364 0.63636364 0.68252363 0.6882472 ]
mean value: 0.7230637763883909
key: train_mcc
value: [0.90500656 0.93043262 0.91500719 0.91500719 0.90500656 0.91026694
0.92519156 0.90522754 0.93021868 0.94034232]
mean value: 0.9181707163710942
key: test_accuracy
value: [0.84444444 0.86666667 0.86666667 0.91111111 0.88888889 0.90909091
0.81818182 0.81818182 0.84090909 0.84090909]
mean value: 0.8605050505050506
key: train_accuracy
value: [0.9525 0.965 0.9575 0.9575 0.9525 0.95511222
0.96259352 0.95261845 0.96508728 0.97007481]
mean value: 0.9590486284289277
key: test_fscore
value: [0.8372093 0.85714286 0.86363636 0.91304348 0.89361702 0.90909091
0.81818182 0.81818182 0.84444444 0.85106383]
mean value: 0.8605611842328492
key: train_fscore
value: [0.95214106 0.96517413 0.95717884 0.95717884 0.95214106 0.95477387
0.96221662 0.95189873 0.96482412 0.97 ]
mean value: 0.9587527276654001
key: test_precision
value: [0.85714286 0.9 0.86363636 0.875 0.84 0.90909091
0.81818182 0.81818182 0.82608696 0.8 ]
mean value: 0.8507320722755506
key: train_precision
value: [0.94974874 0.95098039 0.95477387 0.95477387 0.94974874 0.95
0.95979899 0.95431472 0.96 0.96039604]
mean value: 0.9544535373678533
key: test_recall
value: [0.81818182 0.81818182 0.86363636 0.95454545 0.95454545 0.90909091
0.81818182 0.81818182 0.86363636 0.90909091]
mean value: 0.8727272727272728
key: train_recall
value: [0.95454545 0.97979798 0.95959596 0.95959596 0.95454545 0.95959596
0.96464646 0.94949495 0.96969697 0.97979798]
mean value: 0.9631313131313132
key: test_roc_auc
value: [0.84387352 0.86561265 0.86660079 0.91205534 0.89031621 0.90909091
0.81818182 0.81818182 0.84090909 0.84090909]
mean value: 0.8605731225296442
key: train_roc_auc
value: [0.95252025 0.96514651 0.95752075 0.95752075 0.95252025 0.95516744
0.9626188 0.95257999 0.96514405 0.97019456]
mean value: 0.9590933354419185
key: test_jcc
value: [0.72 0.75 0.76 0.84 0.80769231 0.83333333
0.69230769 0.69230769 0.73076923 0.74074074]
mean value: 0.7567150997150998
key: train_jcc
value: [0.90865385 0.93269231 0.9178744 0.9178744 0.90865385 0.91346154
0.92718447 0.90821256 0.93203883 0.94174757]
mean value: 0.9208393764904951
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01488352 0.01467943 0.01454282 0.01407647 0.01490998 0.01458836
0.02097845 0.01497102 0.01447248 0.01468301]
mean value: 0.01527855396270752
key: score_time
value: [0.01296782 0.01293969 0.0132544 0.0121994 0.01727891 0.01300836
0.01371241 0.01255512 0.01260257 0.01297855]
mean value: 0.01334972381591797
key: test_mcc
value: [0.70501339 0.73559956 0.60079051 0.64613475 0.78530224 0.77352678
0.77352678 0.7800135 0.50471461 0.77352678]
mean value: 0.7078148917725633
key: train_mcc
value: [0.73501647 0.71071591 0.78497756 0.75023791 0.72513051 0.72622252
0.69655581 0.75082817 0.66140847 0.76056935]
mean value: 0.7301662688450195
key: test_accuracy
value: [0.84444444 0.86666667 0.8 0.82222222 0.88888889 0.88636364
0.88636364 0.88636364 0.75 0.88636364]
mean value: 0.8517676767676767
key: train_accuracy
value: [0.8675 0.855 0.8925 0.875 0.8625 0.86284289
0.8478803 0.87531172 0.83042394 0.88029925]
mean value: 0.8649258104738154
key: test_fscore
value: [0.82051282 0.85714286 0.8 0.80952381 0.89361702 0.88372093
0.88372093 0.89361702 0.73170732 0.88372093]
mean value: 0.8457283637503524
key: train_fscore
value: [0.86513995 0.84974093 0.89113924 0.87179487 0.85933504 0.85788114
0.84155844 0.87179487 0.8238342 0.87817259]
mean value: 0.8610391268444171
key: test_precision
value: [0.94117647 0.9 0.7826087 0.85 0.84 0.9047619
0.9047619 0.84 0.78947368 0.9047619 ]
mean value: 0.865754456473665
key: train_precision
value: [0.87179487 0.87234043 0.89340102 0.88541667 0.87046632 0.87830688
0.86631016 0.88541667 0.84574468 0.88265306]
mean value: 0.8751850747942309
key: test_recall
value: [0.72727273 0.81818182 0.81818182 0.77272727 0.95454545 0.86363636
0.86363636 0.95454545 0.68181818 0.86363636]
mean value: 0.8318181818181818
key: train_recall
value: [0.85858586 0.82828283 0.88888889 0.85858586 0.84848485 0.83838384
0.81818182 0.85858586 0.8030303 0.87373737]
mean value: 0.8474747474747475
key: test_roc_auc
value: [0.84189723 0.86561265 0.80039526 0.82114625 0.89031621 0.88636364
0.88636364 0.88636364 0.75 0.88636364]
mean value: 0.8514822134387352
key: train_roc_auc
value: [0.86741174 0.85473547 0.89246425 0.87483748 0.86236124 0.86254167
0.84751455 0.87510574 0.83008658 0.88021844]
mean value: 0.8647277166140259
key: test_jcc
value: [0.69565217 0.75 0.66666667 0.68 0.80769231 0.79166667
0.79166667 0.80769231 0.57692308 0.79166667]
mean value: 0.7359626532887402
key: train_jcc
value: [0.76233184 0.73873874 0.80365297 0.77272727 0.75336323 0.75113122
0.7264574 0.77272727 0.70044053 0.78280543]
mean value: 0.7564375898815598
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02090478 0.01843095 0.04428411 0.05754352 0.04021621 0.04565191
0.04121709 0.05066919 0.06244564 0.05169058]
mean value: 0.043305397033691406
key: score_time
value: [0.01230383 0.02153182 0.01792741 0.03710651 0.03666353 0.01897454
0.02727604 0.02798796 0.02072001 0.02464175]
mean value: 0.02451333999633789
key: test_mcc
value: [0.82506438 0.72299881 0.43884363 0.77865613 0.87476705 0.73960026
0.68313005 0.73029674 0.75592895 0.73029674]
mean value: 0.7279582734082828
key: train_mcc
value: [0.85528899 0.79408263 0.42430608 0.86966298 0.82348041 0.80626333
0.81054468 0.86083265 0.76826689 0.87858211]
mean value: 0.7891310741692911
key: test_accuracy
value: [0.91111111 0.84444444 0.66666667 0.88888889 0.93333333 0.86363636
0.81818182 0.86363636 0.86363636 0.86363636]
mean value: 0.8517171717171718
key: train_accuracy
value: [0.9275 0.89 0.655 0.9325 0.91 0.89526185
0.89775561 0.9276808 0.87281796 0.93765586]
mean value: 0.8846172069825436
key: test_fscore
value: [0.9047619 0.81081081 0.48275862 0.88888889 0.93617021 0.85
0.77777778 0.85714286 0.84210526 0.86956522]
mean value: 0.8219981553387051
key: train_fscore
value: [0.9276808 0.87709497 0.46511628 0.928 0.91304348 0.88202247
0.88515406 0.92225201 0.85302594 0.93946731]
mean value: 0.8592857320609378
key: test_precision
value: [0.95 1. 1. 0.86956522 0.88 0.94444444
1. 0.9 1. 0.83333333]
mean value: 0.9377342995169082
key: train_precision
value: [0.91625616 0.98125 1. 0.98305085 0.875 0.99367089
0.99371069 0.98285714 0.99328859 0.90232558]
mean value: 0.9621409897849462
key: test_recall
value: [0.86363636 0.68181818 0.31818182 0.90909091 1. 0.77272727
0.63636364 0.81818182 0.72727273 0.90909091]
mean value: 0.7636363636363637
key: train_recall
value: [0.93939394 0.79292929 0.3030303 0.87878788 0.95454545 0.79292929
0.7979798 0.86868687 0.74747475 0.97979798]
mean value: 0.8055555555555556
key: test_roc_auc
value: [0.91007905 0.84090909 0.65909091 0.88932806 0.93478261 0.86363636
0.81818182 0.86363636 0.86363636 0.86363636]
mean value: 0.8506916996047431
key: train_roc_auc
value: [0.92761776 0.8890389 0.65151515 0.9319682 0.91044104 0.89400159
0.89652684 0.92695427 0.87127432 0.93817485]
mean value: 0.8837512938485966
key: test_jcc
value: [0.82608696 0.68181818 0.31818182 0.8 0.88 0.73913043
0.63636364 0.75 0.72727273 0.76923077]
mean value: 0.7128084524171481
key: train_jcc
value: [0.86511628 0.78109453 0.3030303 0.86567164 0.84 0.78894472
0.79396985 0.85572139 0.74371859 0.88584475]
mean value: 0.7723112058976718
MCC on Blind test: 0.58
Accuracy on Blind test: 0.76
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.06124377 0.04860258 0.05644703 0.0544219 0.04448938 0.02667141
0.06391263 0.04354548 0.06314015 0.04449439]
mean value: 0.05069687366485596
key: score_time
value: [0.04430127 0.02314234 0.02817941 0.02064776 0.02313566 0.01204276
0.0358007 0.02944756 0.0219655 0.02373385]
mean value: 0.026239681243896484
key: test_mcc
value: [0.78405645 0.73663511 0.73320158 0.73663511 0.8360602 0.7800135
0.68313005 0.56694671 0.82158384 0.21483446]
mean value: 0.6893097003051051
key: train_mcc
value: [0.85260278 0.85532995 0.920046 0.90106836 0.86219639 0.8725435
0.82264299 0.80465299 0.83772405 0.44233239]
mean value: 0.8171139401076141
key: test_accuracy
value: [0.88888889 0.86666667 0.86666667 0.86666667 0.91111111 0.88636364
0.81818182 0.77272727 0.90909091 0.56818182]
mean value: 0.8354545454545454
key: train_accuracy
value: [0.9225 0.925 0.96 0.95 0.93 0.93516209
0.90773067 0.89526185 0.91521197 0.66084788]
mean value: 0.9001714463840399
key: test_fscore
value: [0.87804878 0.86956522 0.86363636 0.86956522 0.91666667 0.89361702
0.77777778 0.73684211 0.91304348 0.68852459]
mean value: 0.8407287218315779
key: train_fscore
value: [0.91598916 0.92822967 0.95979899 0.94818653 0.93170732 0.93658537
0.899729 0.88268156 0.91943128 0.7443609 ]
mean value: 0.9066699774774758
key: test_precision
value: [0.94736842 0.83333333 0.86363636 0.83333333 0.84615385 0.84
1. 0.875 0.875 0.53846154]
mean value: 0.8452286835971047
key: train_precision
value: [0.98830409 0.88181818 0.955 0.97340426 0.9009434 0.90566038
0.97076023 0.9875 0.86607143 0.59281437]
mean value: 0.902227633803653
key: test_recall
value: [0.81818182 0.90909091 0.86363636 0.90909091 1. 0.95454545
0.63636364 0.63636364 0.95454545 0.95454545]
mean value: 0.8636363636363636
key: train_recall
value: [0.85353535 0.97979798 0.96464646 0.92424242 0.96464646 0.96969697
0.83838384 0.7979798 0.97979798 1. ]
mean value: 0.9272727272727272
key: test_roc_auc
value: [0.88735178 0.86758893 0.86660079 0.86758893 0.91304348 0.88636364
0.81818182 0.77272727 0.90909091 0.56818182]
mean value: 0.8356719367588933
key: train_roc_auc
value: [0.92181718 0.92554255 0.960046 0.94974497 0.93034303 0.9355874
0.90687665 0.89406379 0.91600736 0.66502463]
mean value: 0.9005053584176151
key: test_jcc
value: [0.7826087 0.76923077 0.76 0.76923077 0.84615385 0.80769231
0.63636364 0.58333333 0.84 0.525 ]
mean value: 0.7319613357656836
key: train_jcc
value: [0.845 0.86607143 0.92270531 0.90147783 0.87214612 0.88073394
0.81773399 0.79 0.85087719 0.59281437]
mean value: 0.833956019315672
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.29672027 0.4601624 0.43517661 0.41409087 0.33378577 0.4149332
0.38982415 0.22151065 0.21765399 0.2491889 ]
mean value: 0.3433046817779541
key: score_time
value: [0.0209012 0.02083015 0.0207603 0.0409019 0.04082274 0.0407865
0.02054024 0.0209372 0.02071619 0.04051232]
mean value: 0.028770875930786134
key: test_mcc
value: [1. 0.91452919 0.91106719 0.91485328 1. 0.91287093
0.86452993 0.95553309 0.95553309 0.86452993]
mean value: 0.9293446631562545
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.95555556 0.95555556 0.95555556 1. 0.95454545
0.93181818 0.97727273 0.97727273 0.93181818]
mean value: 0.963939393939394
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 0.95454545 0.95652174 1. 0.95652174
0.93023256 0.97777778 0.97674419 0.93023256]
mean value: 0.9634956965290635
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.95454545 0.91666667 1. 0.91666667
0.95238095 0.95652174 1. 0.95238095]
mean value: 0.9649162431771128
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.95454545 1. 1. 1.
0.90909091 1. 0.95454545 0.90909091]
mean value: 0.9636363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.95454545 0.9555336 0.95652174 1. 0.95454545
0.93181818 0.97727273 0.97727273 0.93181818]
mean value: 0.9639328063241107
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 0.91304348 0.91666667 1. 0.91666667
0.86956522 0.95652174 0.95454545 0.86956522]
mean value: 0.930566534914361
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.11559272 0.07994819 0.09201694 0.09574533 0.08644199 0.09145522
0.11973166 0.12139821 0.11484957 0.12906265]
mean value: 0.10462424755096436
key: score_time
value: [0.0227809 0.02421069 0.02284312 0.02478456 0.02595878 0.02617145
0.0282135 0.02829266 0.02692151 0.02787161]
mean value: 0.025804877281188965
key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.90909091
0.86452993 0.90909091 0.86452993 0.86452993]
mean value: 0.9105281028070168
key: train_mcc
value: [0.98004502 0.989999 0.9900495 0.99501169 0.99501169 0.97506905
0.99502376 0.98009308 0.98514815 0.98009308]
mean value: 0.9865544028626636
key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545
0.93181818 0.95454545 0.93181818 0.93181818]
mean value: 0.9548989898989899
key: train_accuracy
value: [0.99 0.995 0.995 0.9975 0.9975 0.98753117
0.99750623 0.99002494 0.9925187 0.99002494]
mean value: 0.9932605985037406
key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95454545
0.93023256 0.95454545 0.93023256 0.93333333]
mean value: 0.9546279784702434
key: train_fscore
value: [0.98984772 0.99494949 0.99497487 0.99746835 0.99746835 0.98734177
0.99746835 0.98984772 0.9924812 0.98984772]
mean value: 0.9931695554980033
key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.95454545
0.95238095 0.95454545 0.95238095 0.91304348]
mean value: 0.9509175607001694
key: train_precision
value: [0.99489796 0.99494949 0.99 1. 1. 0.98984772
1. 0.99489796 0.98507463 0.99489796]
mean value: 0.9944565715102227
key: test_recall
value: [1. 0.95454545 0.95454545 1. 1. 0.95454545
0.90909091 0.95454545 0.90909091 0.95454545]
mean value: 0.9590909090909091
key: train_recall
value: [0.98484848 0.99494949 1. 0.99494949 0.99494949 0.98484848
0.99494949 0.98484848 1. 0.98484848]
mean value: 0.9919191919191919
key: test_roc_auc
value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545
0.93181818 0.95454545 0.93181818 0.93181818]
mean value: 0.9550395256916997
key: train_roc_auc
value: [0.98994899 0.9949995 0.9950495 0.99747475 0.99747475 0.98749813
0.99747475 0.98996119 0.99261084 0.98996119]
mean value: 0.9932453590186605
key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91304348
0.86956522 0.91304348 0.86956522 0.875 ]
mean value: 0.9139492753623188
key: train_jcc
value: [0.9798995 0.98994975 0.99 0.99494949 0.99494949 0.975
0.99494949 0.9798995 0.98507463 0.9798995 ]
mean value: 0.9864571352920186
MCC on Blind test: 0.96
Accuracy on Blind test: 0.98
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16882658 0.22207904 0.2207644 0.23970628 0.2196691 0.18807268
0.20044613 0.23684382 0.19207048 0.23526812]
mean value: 0.21237466335296631
key: score_time
value: [0.02288747 0.02833319 0.03328061 0.03298473 0.03413868 0.04017615
0.03194618 0.03593063 0.03210187 0.03190231]
mean value: 0.03236818313598633
key: test_mcc
value: [0.70501339 0.64613475 0.55666994 0.51185771 0.51089209 0.68252363
0.77352678 0.60678804 0.59152048 0.73029674]
mean value: 0.6315223557482503
key: train_mcc
value: [0.98510714 1. 0.99004752 0.99004752 0.99004752 0.99007143
0.99007143 0.99502376 0.99007143 0.99007143]
mean value: 0.9910559193261419
key: test_accuracy
value: [0.84444444 0.82222222 0.77777778 0.75555556 0.75555556 0.84090909
0.88636364 0.79545455 0.79545455 0.86363636]
mean value: 0.8137373737373738
key: train_accuracy
value: [0.9925 1. 0.995 0.995 0.995 0.99501247
0.99501247 0.99750623 0.99501247 0.99501247]
mean value: 0.9955056109725686
key: test_fscore
value: [0.82051282 0.80952381 0.76190476 0.75555556 0.74418605 0.84444444
0.88372093 0.76923077 0.79069767 0.86956522]
mean value: 0.8049342029726256
key: train_fscore
value: [0.99236641 1. 0.99492386 0.99492386 0.99492386 0.99492386
0.99492386 0.99746835 0.99492386 0.99492386]
mean value: 0.9954301771720262
key: test_precision
value: [0.94117647 0.85 0.8 0.73913043 0.76190476 0.82608696
0.9047619 0.88235294 0.80952381 0.83333333]
mean value: 0.8348270612592863
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.77272727 0.72727273 0.77272727 0.72727273 0.86363636
0.86363636 0.68181818 0.77272727 0.90909091]
mean value: 0.7818181818181819
key: train_recall
value: [0.98484848 1. 0.98989899 0.98989899 0.98989899 0.98989899
0.98989899 0.99494949 0.98989899 0.98989899]
mean value: 0.990909090909091
key: test_roc_auc
value: [0.84189723 0.82114625 0.77667984 0.75592885 0.75494071 0.84090909
0.88636364 0.79545455 0.79545455 0.86363636]
mean value: 0.8132411067193676
key: train_roc_auc
value: [0.99242424 1. 0.99494949 0.99494949 0.99494949 0.99494949
0.99494949 0.99747475 0.99494949 0.99494949]
mean value: 0.9954545454545455
key: test_jcc
value: [0.69565217 0.68 0.61538462 0.60714286 0.59259259 0.73076923
0.79166667 0.625 0.65384615 0.76923077]
mean value: 0.676128505954593
key: train_jcc
value: [0.98484848 1. 0.98989899 0.98989899 0.98989899 0.98989899
0.98989899 0.99494949 0.98989899 0.98989899]
mean value: 0.990909090909091
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.919806 1.02738333 0.90889049 0.87410593 1.04062057 1.07893395
1.05196571 0.85241294 0.64149427 0.65414619]
mean value: 0.9049759387969971
key: score_time
value: [0.01315594 0.01307869 0.01312375 0.02761984 0.0129056 0.01343155
0.01265907 0.01010847 0.01001835 0.00931072]
mean value: 0.013541197776794434
key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.91287093
0.90909091 0.95553309 0.90909091 0.81818182]
mean value: 0.9198277056731419
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545
0.95454545 0.97727273 0.95454545 0.90909091]
mean value: 0.9594444444444444
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95652174
0.95454545 0.97777778 0.95454545 0.90909091]
mean value: 0.9595871761089152
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.91666667
0.95454545 0.95652174 0.95454545 0.90909091]
mean value: 0.9473649538866931
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.95454545 0.95454545 1. 1. 1.
0.95454545 1. 0.95454545 0.90909091]
mean value: 0.9727272727272728
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545
0.95454545 0.97727273 0.95454545 0.90909091]
mean value: 0.9595849802371541
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91666667
0.91304348 0.95652174 0.91304348 0.83333333]
mean value: 0.9231884057971014
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.1153357 0.06680679 0.06007385 0.09345102 0.07372856 0.1125803
0.13374567 0.08797312 0.09443545 0.10179043]
mean value: 0.09399209022521973
key: score_time
value: [0.0196929 0.02079773 0.020576 0.02017331 0.03191543 0.02789617
0.02618909 0.0400703 0.02141643 0.02100539]
mean value: 0.024973273277282715
key: test_mcc
value: [0.28827551 0.59109821 0.44008623 0.20198059 0.28827551 0.50471461
0.48795004 0.50051733 0.56694671 0.28347335]
mean value: 0.4153318091778042
key: train_mcc
value: [0.84199403 0.87973027 0.96074967 0.91500719 0.88626479 0.96539284
0.93515962 0.98514265 0.92771103 0.75793176]
mean value: 0.9055083848914132
key: test_accuracy
value: [0.64444444 0.77777778 0.71111111 0.6 0.64444444 0.75
0.72727273 0.75 0.77272727 0.63636364]
mean value: 0.7014141414141414
key: train_accuracy
value: [0.915 0.9375 0.98 0.9575 0.94 0.98254364
0.96758105 0.9925187 0.96259352 0.86533666]
mean value: 0.9500573566084788
key: test_fscore
value: [0.61904762 0.72222222 0.64864865 0.60869565 0.61904762 0.76595745
0.66666667 0.74418605 0.73684211 0.57894737]
mean value: 0.6710261394811038
key: train_fscore
value: [0.90607735 0.93333333 0.97938144 0.95717884 0.93548387 0.98254364
0.96708861 0.99236641 0.96062992 0.84210526]
mean value: 0.9456188682100336
key: test_precision
value: [0.65 0.92857143 0.8 0.58333333 0.65 0.72
0.85714286 0.76190476 0.875 0.6875 ]
mean value: 0.7513452380952381
key: train_precision
value: [1. 0.98870056 1. 0.95477387 1. 0.97044335
0.96954315 1. 1. 1. ]
mean value: 0.9883460931280301
key: test_recall
value: [0.59090909 0.59090909 0.54545455 0.63636364 0.59090909 0.81818182
0.54545455 0.72727273 0.63636364 0.5 ]
mean value: 0.6181818181818182
key: train_recall
value: [0.82828283 0.88383838 0.95959596 0.95959596 0.87878788 0.99494949
0.96464646 0.98484848 0.92424242 0.72727273]
mean value: 0.9106060606060606
key: test_roc_auc
value: [0.64328063 0.77371542 0.70750988 0.60079051 0.64328063 0.75
0.72727273 0.75 0.77272727 0.63636364]
mean value: 0.7004940711462451
key: train_roc_auc
value: [0.91414141 0.9369687 0.97979798 0.95752075 0.93939394 0.98269642
0.96754491 0.99242424 0.96212121 0.86363636]
mean value: 0.9496245930011721
key: test_jcc
value: [0.44827586 0.56521739 0.48 0.4375 0.44827586 0.62068966
0.5 0.59259259 0.58333333 0.40740741]
mean value: 0.5083292103948026
key: train_jcc
value: [0.82828283 0.875 0.95959596 0.9178744 0.87878788 0.96568627
0.93627451 0.98484848 0.92424242 0.72727273]
mean value: 0.8997865483479294
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.04879665 0.06443977 0.06408358 0.06347299 0.0626421 0.05391741
0.04901862 0.06241035 0.06304216 0.06274414]
mean value: 0.05945677757263183
key: score_time
value: [0.03340101 0.03197694 0.03278589 0.03180408 0.03367066 0.03720546
0.0333674 0.02973032 0.03171229 0.03540778]
mean value: 0.0331061840057373
key: test_mcc
value: [0.78405645 0.78405645 0.73320158 0.8360602 0.82574419 0.81818182
0.86452993 0.72727273 0.77352678 0.77352678]
mean value: 0.7920156928908952
key: train_mcc
value: [0.86529061 0.87073544 0.85500344 0.85510344 0.84528736 0.86562671
0.87541359 0.86032741 0.8705095 0.8903152 ]
mean value: 0.8653612710221481
key: test_accuracy
value: [0.88888889 0.88888889 0.86666667 0.91111111 0.91111111 0.90909091
0.93181818 0.86363636 0.88636364 0.88636364]
mean value: 0.8943939393939394
key: train_accuracy
value: [0.9325 0.935 0.9275 0.9275 0.9225 0.93266833
0.93765586 0.93017456 0.93516209 0.94513716]
mean value: 0.9325798004987531
key: test_fscore
value: [0.87804878 0.87804878 0.86363636 0.91666667 0.91304348 0.90909091
0.93023256 0.86363636 0.88888889 0.88888889]
mean value: 0.8930181678184095
key: train_fscore
value: [0.93266833 0.93564356 0.92695214 0.9273183 0.92269327 0.93266833
0.93734336 0.92929293 0.935 0.94472362]
mean value: 0.9324303832120122
key: test_precision
value: [0.94736842 0.94736842 0.86363636 0.84615385 0.875 0.90909091
0.95238095 0.86363636 0.86956522 0.86956522]
mean value: 0.8943765711786307
key: train_precision
value: [0.92118227 0.91747573 0.92462312 0.92039801 0.91133005 0.92118227
0.93034826 0.92929293 0.92574257 0.94 ]
mean value: 0.9241575197221089
key: test_recall
value: [0.81818182 0.81818182 0.86363636 1. 0.95454545 0.90909091
0.90909091 0.86363636 0.90909091 0.90909091]
mean value: 0.8954545454545455
key: train_recall
value: [0.94444444 0.95454545 0.92929293 0.93434343 0.93434343 0.94444444
0.94444444 0.92929293 0.94444444 0.94949495]
mean value: 0.9409090909090909
key: test_roc_auc
value: [0.88735178 0.88735178 0.86660079 0.91304348 0.91205534 0.90909091
0.93181818 0.86363636 0.88636364 0.88636364]
mean value: 0.8943675889328064
key: train_roc_auc
value: [0.93261826 0.93519352 0.92751775 0.92756776 0.92261726 0.93281336
0.93773946 0.93016371 0.93527641 0.94519082]
mean value: 0.9326698310225111
key: test_jcc
value: [0.7826087 0.7826087 0.76 0.84615385 0.84 0.83333333
0.86956522 0.76 0.8 0.8 ]
mean value: 0.8074269788182832
key: train_jcc
value: [0.87383178 0.87906977 0.86384977 0.86448598 0.85648148 0.87383178
0.88207547 0.86792453 0.87793427 0.8952381 ]
mean value: 0.8734722914430403
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.55836701 0.52241182 0.42924356 0.42723894 0.42407441 0.45521879
0.46713424 0.43198848 0.44248724 0.43643761]
mean value: 0.4594602108001709
key: score_time
value: [0.03078747 0.03211427 0.03577662 0.03534436 0.03439927 0.0340023
0.0342772 0.03501654 0.03661156 0.03467274]
mean value: 0.03430023193359375
key: test_mcc
value: [0.78405645 0.78405645 0.60000118 0.8360602 0.82574419 0.81818182
0.81818182 0.72727273 0.77352678 0.77352678]
mean value: 0.774060840755457
key: train_mcc
value: [0.86529061 0.87073544 0.81510094 0.809981 0.84528736 0.86562671
0.90023387 0.86032741 0.8705095 0.8903152 ]
mean value: 0.8593408045055162
key: test_accuracy
value: [0.88888889 0.88888889 0.8 0.91111111 0.91111111 0.90909091
0.90909091 0.86363636 0.88636364 0.88636364]
mean value: 0.8854545454545455
key: train_accuracy
value: [0.9325 0.935 0.9075 0.905 0.9225 0.93266833
0.95012469 0.93017456 0.93516209 0.94513716]
mean value: 0.9295766832917706
key: test_fscore
value: [0.87804878 0.87804878 0.79069767 0.91666667 0.91304348 0.90909091
0.90909091 0.86363636 0.88888889 0.88888889]
mean value: 0.883610133991771
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93266833 0.93564356 0.90726817 0.9040404 0.92269327 0.93266833
0.94949495 0.92929293 0.935 0.94472362]
mean value: 0.9293493560888269
key: test_precision
value: [0.94736842 0.94736842 0.80952381 0.84615385 0.875 0.90909091
0.90909091 0.86363636 0.86956522 0.86956522]
mean value: 0.884636311438371
key: train_precision
value: [0.92118227 0.91747573 0.90049751 0.9040404 0.91133005 0.92118227
0.94949495 0.92929293 0.92574257 0.94 ]
mean value: 0.9220238678959648
key: test_recall
value: [0.81818182 0.81818182 0.77272727 1. 0.95454545 0.90909091
0.90909091 0.86363636 0.90909091 0.90909091]
mean value: 0.8863636363636364
key: train_recall
value: [0.94444444 0.95454545 0.91414141 0.9040404 0.93434343 0.94444444
0.94949495 0.92929293 0.94444444 0.94949495]
mean value: 0.9368686868686869
key: test_roc_auc
value: [0.88735178 0.88735178 0.79940711 0.91304348 0.91205534 0.90909091
0.90909091 0.86363636 0.88636364 0.88636364]
mean value: 0.8853754940711462
key: train_roc_auc
value: [0.93261826 0.93519352 0.90756576 0.9049905 0.92261726 0.93281336
0.95011693 0.93016371 0.93527641 0.94519082]
mean value: 0.9296546526573839
key: test_jcc
value: [0.7826087 0.7826087 0.65384615 0.84615385 0.84 0.83333333
0.83333333 0.76 0.8 0.8 ]
mean value: 0.7931884057971015
key: train_jcc
value: [0.87383178 0.87906977 0.83027523 0.82488479 0.85648148 0.87383178
0.90384615 0.86792453 0.87793427 0.8952381 ]
mean value: 0.8683317871996343
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.1082418 0.11077762 0.11822391 0.08075857 0.08925223 0.06861591
0.10095501 0.08981323 0.08645415 0.09447336]
mean value: 0.09475657939910889
key: score_time
value: [0.02976584 0.03569388 0.0412004 0.01653552 0.02434349 0.02014518
0.02598023 0.02477598 0.03175187 0.02436733]
mean value: 0.027455973625183105
key: test_mcc
value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
0.69583743 0.68911026 0.95652174 0.82213439]
mean value: 0.7925409622270596
key: train_mcc
value: [0.85688852 0.8716498 0.86172755 0.86177295 0.87664317 0.85687806
0.86188899 0.871768 0.85679795 0.86188899]
mean value: 0.8637903997161932
key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
0.84444444 0.84444444 0.97777778 0.91111111]
mean value: 0.8955555555555555
key: train_accuracy
value: [0.92839506 0.93580247 0.9308642 0.9308642 0.9382716 0.92839506
0.9308642 0.93580247 0.92839506 0.9308642 ]
mean value: 0.9318518518518518
key: test_fscore
value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
0.85106383 0.8372093 0.97777778 0.90909091]
mean value: 0.8966281710028886
key: train_fscore
value: [0.92874693 0.93596059 0.93069307 0.93103448 0.93857494 0.92909535
0.93170732 0.93658537 0.92874693 0.93170732]
mean value: 0.932285229379058
key: test_precision
value: [0.90909091 0.82608696 0.91666667 0.875 0.95454545 0.875
0.8 0.85714286 0.95652174 0.90909091]
mean value: 0.887914549218897
key: train_precision
value: [0.92195122 0.93137255 0.93069307 0.92647059 0.93170732 0.9223301
0.92270531 0.92753623 0.92647059 0.92270531]
mean value: 0.9263942288373254
key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
0.90909091 0.81818182 1. 0.90909091]
mean value: 0.9069169960474308
key: train_recall
value: [0.93564356 0.94059406 0.93069307 0.93564356 0.94554455 0.93596059
0.9408867 0.94581281 0.93103448 0.9408867 ]
mean value: 0.9382700092669365
key: test_roc_auc
value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
0.8458498 0.84387352 0.97826087 0.91106719]
mean value: 0.8957509881422925
key: train_roc_auc
value: [0.92841292 0.93581427 0.93086378 0.93087597 0.93828952 0.92837634
0.93083939 0.93577769 0.92838853 0.93083939]
mean value: 0.9318477783738965
key: test_jcc
value: [0.8 0.7037037 0.88 0.80769231 0.875 0.84
0.74074074 0.72 0.95652174 0.83333333]
mean value: 0.815699182460052
key: train_jcc
value: [0.86697248 0.87962963 0.87037037 0.87096774 0.88425926 0.86757991
0.87214612 0.88073394 0.86697248 0.87214612]
mean value: 0.8731778046396034
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.91950297 2.00819826 2.98527718 1.89961886 0.94165349 1.18352866
0.94931173 1.02501869 0.95997047 0.98270178]
mean value: 1.4854782104492188
key: score_time
value: [0.01864243 0.04957104 0.03117299 0.01511431 0.01509213 0.0130074
0.01559639 0.01299739 0.01515102 0.01567769]
mean value: 0.020202279090881348
key: test_mcc
value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419
0.73663511 0.68911026 0.95652174 0.77821935]
mean value: 0.797237707853288
key: train_mcc
value: [0.8965753 0.90127552 0.89135736 0.89139819 0.90627515 0.82225691
0.89630533 0.84716163 0.88164702 0.89152603]
mean value: 0.8825778442400698
key: test_accuracy
value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111
0.86666667 0.84444444 0.97777778 0.88888889]
mean value: 0.8977777777777778
key: train_accuracy
value: [0.94814815 0.95061728 0.94567901 0.94567901 0.95308642 0.91111111
0.94814815 0.92345679 0.94074074 0.94567901]
mean value: 0.9412345679012346
key: test_fscore
value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348
0.86956522 0.8372093 0.97777778 0.88372093]
mean value: 0.8980349587999696
key: train_fscore
value: [0.94865526 0.95024876 0.94554455 0.94527363 0.95331695 0.91176471
0.94840295 0.92457421 0.94146341 0.94634146]
mean value: 0.9415585894135641
key: test_precision
value: [0.95238095 0.83333333 0.91666667 0.875 0.95238095 0.875
0.83333333 0.85714286 0.95652174 0.9047619 ]
mean value: 0.8956521739130434
key: train_precision
value: [0.93719807 0.955 0.94554455 0.95 0.94634146 0.90731707
0.94607843 0.91346154 0.93236715 0.93719807]
mean value: 0.9370506345899053
key: test_recall
value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545
0.90909091 0.81818182 1. 0.86363636]
mean value: 0.9023715415019763
key: train_recall
value: [0.96039604 0.94554455 0.94554455 0.94059406 0.96039604 0.91625616
0.95073892 0.93596059 0.95073892 0.95566502]
mean value: 0.9461834853436082
key: test_roc_auc
value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534
0.86758893 0.84387352 0.97826087 0.88833992]
mean value: 0.8979249011857707
key: train_roc_auc
value: [0.94817832 0.95060479 0.94567868 0.94566649 0.95310442 0.91109838
0.94814174 0.92342584 0.94071599 0.94565429]
mean value: 0.9412268936253231
key: test_jcc
value: [0.83333333 0.74074074 0.88 0.80769231 0.83333333 0.84
0.76923077 0.72 0.95652174 0.79166667]
mean value: 0.8172518890127586
key: train_jcc
value: [0.90232558 0.90521327 0.89671362 0.89622642 0.91079812 0.83783784
0.90186916 0.85972851 0.88940092 0.89814815]
mean value: 0.8898261577031877
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.02403355 0.01210332 0.01167107 0.01175499 0.01173353 0.01152921
0.01138949 0.01154685 0.01153827 0.01136518]
mean value: 0.012866544723510741
key: score_time
value: [0.01191664 0.01043248 0.01031709 0.01031232 0.00995064 0.01007748
0.00997305 0.00997305 0.01015615 0.00996041]
mean value: 0.010306930541992188
key: test_mcc
value: [0.86758893 0.55841694 0.60079051 0.61706091 0.74605372 0.73320158
0.42993591 0.60000118 0.59109821 0.69404997]
mean value: 0.6438197861131165
key: train_mcc
value: [0.67340117 0.66345741 0.67340117 0.68334493 0.69787618 0.66984267
0.647501 0.71790239 0.68673529 0.6682388 ]
mean value: 0.6781701020371597
key: test_accuracy
value: [0.93333333 0.77777778 0.8 0.8 0.86666667 0.86666667
0.71111111 0.8 0.77777778 0.84444444]
mean value: 0.8177777777777778
key: train_accuracy
value: [0.8345679 0.82962963 0.8345679 0.83950617 0.84691358 0.83209877
0.81481481 0.85679012 0.84197531 0.83209877]
mean value: 0.8362962962962963
key: test_fscore
value: [0.93333333 0.77272727 0.8 0.7804878 0.85714286 0.86363636
0.66666667 0.79069767 0.72222222 0.82926829]
mean value: 0.8016182487708297
key: train_fscore
value: [0.82414698 0.81889764 0.82414698 0.82939633 0.83769634 0.82105263
0.79108635 0.84895833 0.83505155 0.82291667]
mean value: 0.8253349790533351
key: test_precision
value: [0.95454545 0.80952381 0.81818182 0.88888889 0.94736842 0.86363636
0.76470588 0.80952381 0.92857143 0.89473684]
mean value: 0.8679682718382409
key: train_precision
value: [0.87709497 0.87150838 0.87709497 0.88268156 0.88888889 0.88135593
0.91025641 0.90055249 0.87567568 0.87292818]
mean value: 0.8838037458275947
key: test_recall
value: [0.91304348 0.73913043 0.7826087 0.69565217 0.7826087 0.86363636
0.59090909 0.77272727 0.59090909 0.77272727]
mean value: 0.750395256916996
key: train_recall
value: [0.77722772 0.77227723 0.77722772 0.78217822 0.79207921 0.76847291
0.69950739 0.80295567 0.79802956 0.77832512]
mean value: 0.774828073940399
key: test_roc_auc
value: [0.93379447 0.77865613 0.80039526 0.80237154 0.86857708 0.86660079
0.70849802 0.79940711 0.77371542 0.84288538]
mean value: 0.8174901185770751
key: train_roc_auc
value: [0.83442667 0.82948837 0.83442667 0.83936497 0.84677852 0.83225626
0.81510023 0.85692338 0.84208409 0.83223187]
mean value: 0.8363081012534751
key: test_jcc
value: [0.875 0.62962963 0.66666667 0.64 0.75 0.76
0.5 0.65384615 0.56521739 0.70833333]
mean value: 0.6748693174780132
key: train_jcc
value: [0.70089286 0.69333333 0.70089286 0.70852018 0.72072072 0.69642857
0.65437788 0.73755656 0.71681416 0.69911504]
mean value: 0.7028652163950665
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01156735 0.01157784 0.01160192 0.01164412 0.01163125 0.01170754
0.01092172 0.01057267 0.0117619 0.0108943 ]
mean value: 0.011388063430786133
key: score_time
value: [0.00963807 0.01005244 0.00998616 0.01013279 0.01015091 0.01013255
0.01019764 0.0097158 0.01023817 0.01015043]
mean value: 0.010039496421813964
key: test_mcc
value: [0.74605372 0.4229249 0.68972332 0.69404997 0.78530224 0.78530224
0.55841694 0.64426877 0.73559956 0.60637261]
mean value: 0.666801427789491
key: train_mcc
value: [0.7385111 0.69500224 0.73847923 0.75849711 0.72839898 0.76296152
0.77288136 0.7777832 0.72358281 0.73337398]
mean value: 0.7429471510582903
key: test_accuracy
value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889
0.77777778 0.82222222 0.86666667 0.8 ]
mean value: 0.8311111111111111
key: train_accuracy
value: [0.8691358 0.84691358 0.8691358 0.87901235 0.86419753 0.88148148
0.88641975 0.88888889 0.8617284 0.86666667]
mean value: 0.871358024691358
key: test_fscore
value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702
0.7826087 0.81818182 0.85714286 0.80851064]
mean value: 0.8313623230625146
key: train_fscore
value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734
0.88613861 0.88943489 0.86341463 0.86633663]
mean value: 0.8710820634544677
key: test_precision
value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95 0.84
0.75 0.81818182 0.9 0.76 ]
mean value: 0.8364151637835848
key: train_precision
value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734
0.89054726 0.8872549 0.85507246 0.87064677]
mean value: 0.8723492167626019
key: test_recall
value: [0.7826087 0.69565217 0.82608696 0.91304348 0.82608696 0.95454545
0.81818182 0.81818182 0.81818182 0.86363636]
mean value: 0.8316205533596838
key: train_recall
value: [0.88118812 0.81683168 0.85643564 0.8960396 0.86138614 0.8817734
0.8817734 0.89162562 0.87192118 0.86206897]
mean value: 0.8701043749695166
key: test_roc_auc
value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621
0.77865613 0.82213439 0.86561265 0.8013834 ]
mean value: 0.8316205533596839
key: train_roc_auc
value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076
0.88643125 0.88888211 0.86170317 0.86667805]
mean value: 0.8713529727356972
key: test_jcc
value: [0.75 0.55172414 0.73076923 0.75 0.79166667 0.80769231
0.64285714 0.69230769 0.75 0.67857143]
mean value: 0.7145588606795503
key: train_jcc
value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626
0.79555556 0.80088496 0.75965665 0.76419214]
mean value: 0.7718539151085453
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01085877 0.0109148 0.0109849 0.01096702 0.01075697 0.01008892
0.01093984 0.01145196 0.01152682 0.01099157]
mean value: 0.01094815731048584
key: score_time
value: [0.01825953 0.01752472 0.01740408 0.01779437 0.01755857 0.01792526
0.01943898 0.01912856 0.01847744 0.01721025]
mean value: 0.018072175979614257
key: test_mcc
value: [0.5169078 0.48698902 0.51185771 0.44008623 0.64752602 0.54071329
0.37774032 0.37774032 0.58158 0.60000118]
mean value: 0.5081141873870801
key: train_mcc
value: [0.69876844 0.71387102 0.68398976 0.72358281 0.68953436 0.71448494
0.70937299 0.6938666 0.69877579 0.72349713]
mean value: 0.7049743847733497
key: test_accuracy
value: [0.75555556 0.73333333 0.75555556 0.71111111 0.82222222 0.75555556
0.68888889 0.68888889 0.75555556 0.8 ]
mean value: 0.7466666666666667
key: train_accuracy
value: [0.84938272 0.85679012 0.84197531 0.8617284 0.84444444 0.85679012
0.85432099 0.84691358 0.84938272 0.8617284 ]
mean value: 0.8523456790123457
key: test_fscore
value: [0.74418605 0.7 0.75555556 0.75471698 0.81818182 0.78431373
0.66666667 0.66666667 0.66666667 0.79069767]
mean value: 0.7347651801289877
key: train_fscore
value: [0.84863524 0.85427136 0.84236453 0.86 0.84050633 0.85353535
0.85138539 0.84653465 0.84938272 0.86138614]
mean value: 0.8508001705741715
key: test_precision
value: [0.8 0.82352941 0.77272727 0.66666667 0.85714286 0.68965517
0.7 0.7 1. 0.80952381]
mean value: 0.7819245190239105
key: train_precision
value: [0.85074627 0.86734694 0.83823529 0.86868687 0.86010363 0.87564767
0.87113402 0.85074627 0.85148515 0.86567164]
mean value: 0.8599803745154699
key: test_recall
value: [0.69565217 0.60869565 0.73913043 0.86956522 0.7826087 0.90909091
0.63636364 0.63636364 0.5 0.77272727]
mean value: 0.7150197628458498
key: train_recall
value: [0.84653465 0.84158416 0.84653465 0.85148515 0.82178218 0.83251232
0.83251232 0.84236453 0.84729064 0.85714286]
mean value: 0.841974345217773
key: test_roc_auc
value: [0.756917 0.73616601 0.75592885 0.70750988 0.82312253 0.75889328
0.68774704 0.68774704 0.75 0.79940711]
mean value: 0.7463438735177865
key: train_roc_auc
value: [0.8493757 0.85675267 0.84198654 0.86170317 0.84438863 0.85685022
0.85437497 0.84692484 0.84938789 0.86173975]
mean value: 0.8523484368141248
key: test_jcc
value: [0.59259259 0.53846154 0.60714286 0.60606061 0.69230769 0.64516129
0.5 0.5 0.5 0.65384615]
mean value: 0.5835572730734021
key: train_jcc
value: [0.73706897 0.74561404 0.72765957 0.75438596 0.72489083 0.74449339
0.74122807 0.73390558 0.73819742 0.75652174]
mean value: 0.7403965575347853
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0228219 0.02206445 0.02394438 0.02210402 0.02183819 0.02219343
0.02140975 0.01981401 0.02201772 0.0217278 ]
mean value: 0.021993565559387206
key: score_time
value: [0.01345062 0.01252699 0.01257086 0.01266456 0.01265097 0.01267982
0.01264071 0.01242614 0.01275897 0.01246643]
mean value: 0.012683606147766114
key: test_mcc
value: [0.73663511 0.68972332 0.86732843 0.73559956 0.86758893 0.82574419
0.77865613 0.68911026 0.95652174 0.73320158]
mean value: 0.7880109260079755
key: train_mcc
value: [0.79762457 0.82239025 0.80251189 0.80246793 0.80766419 0.81237958
0.80741373 0.82237294 0.80261491 0.80741373]
mean value: 0.8084853722573764
key: test_accuracy
value: [0.86666667 0.84444444 0.93333333 0.86666667 0.93333333 0.91111111
0.88888889 0.84444444 0.97777778 0.86666667]
mean value: 0.8933333333333333
key: train_accuracy
value: [0.89876543 0.91111111 0.90123457 0.90123457 0.9037037 0.90617284
0.9037037 0.91111111 0.90123457 0.9037037 ]
mean value: 0.9041975308641975
key: test_fscore
value: [0.86363636 0.84444444 0.93617021 0.875 0.93333333 0.91304348
0.88888889 0.8372093 0.97777778 0.86363636]
mean value: 0.893314016506958
key: train_fscore
value: [0.8992629 0.91176471 0.90147783 0.9009901 0.90464548 0.90686275
0.9041769 0.91219512 0.90243902 0.9041769 ]
mean value: 0.9047991713233395
key: test_precision
value: [0.9047619 0.86363636 0.91666667 0.84 0.95454545 0.875
0.86956522 0.85714286 0.95652174 0.86363636]
mean value: 0.8901476566911349
key: train_precision
value: [0.89268293 0.90291262 0.89705882 0.9009901 0.89371981 0.90243902
0.90196078 0.90338164 0.89371981 0.90196078]
mean value: 0.8990826319784146
key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
0.90909091 0.81818182 1. 0.86363636]
mean value: 0.8980237154150198
key: train_recall
value: [0.90594059 0.92079208 0.90594059 0.9009901 0.91584158 0.91133005
0.90640394 0.92118227 0.91133005 0.90640394]
mean value: 0.9106155196800468
key: test_roc_auc
value: [0.86758893 0.84486166 0.93280632 0.86561265 0.93379447 0.91205534
0.88932806 0.84387352 0.97826087 0.86660079]
mean value: 0.8934782608695653
key: train_roc_auc
value: [0.8987831 0.91113496 0.90124616 0.90123397 0.9037336 0.90616007
0.90369702 0.91108618 0.90120958 0.90369702]
mean value: 0.9041981661220309
key: test_jcc
value: [0.76 0.73076923 0.88 0.77777778 0.875 0.84
0.8 0.72 0.95652174 0.76 ]
mean value: 0.8100068747677444
key: train_jcc
value: [0.81696429 0.83783784 0.8206278 0.81981982 0.82589286 0.82959641
0.82511211 0.83856502 0.82222222 0.82511211]
mean value: 0.8261750475651821
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.65939665 1.75964832 0.57226229 0.91595173 0.58495402 0.47136617
0.58204484 0.84763002 0.33321142 0.77313566]
mean value: 0.7499601125717164
key: score_time
value: [0.01269627 0.01352549 0.01349235 0.01917243 0.01356983 0.01362419
0.01330686 0.01411915 0.01365542 0.01274872]
mean value: 0.013991069793701173
key: test_mcc
value: [0.73663511 0.670374 0.86732843 0.82506438 0.86758893 0.82574419
0.77821935 0.73559956 1. 0.74410286]
mean value: 0.805065681579282
key: train_mcc
value: [0.83730123 0.85146676 0.82742221 0.8520244 0.83086317 0.81729057
0.83012449 0.82799641 0.7927359 0.81956701]
mean value: 0.8286792157204221
key: test_accuracy
value: [0.86666667 0.82222222 0.93333333 0.91111111 0.93333333 0.91111111
0.88888889 0.86666667 1. 0.86666667]
mean value: 0.9
key: train_accuracy
value: [0.91851852 0.92345679 0.91358025 0.92592593 0.91358025 0.90864198
0.91358025 0.91358025 0.8962963 0.90864198]
mean value: 0.9135802469135802
key: test_fscore
value: [0.86363636 0.8 0.93617021 0.91666667 0.93333333 0.91304348
0.88372093 0.85714286 1. 0.85 ]
mean value: 0.8953713842038605
key: train_fscore
value: [0.9193154 0.91906005 0.91442543 0.92647059 0.91725768 0.90909091
0.91002571 0.91183879 0.89756098 0.90537084]
mean value: 0.9130416381528887
key: test_precision
value: [0.9047619 0.94117647 0.91666667 0.88 0.95454545 0.875
0.9047619 0.9 1. 0.94444444]
mean value: 0.922135684576861
key: train_precision
value: [0.90821256 0.97237569 0.90338164 0.91747573 0.87782805 0.90686275
0.9516129 0.93298969 0.88888889 0.94148936]
mean value: 0.920111726559678
key: test_recall
value: [0.82608696 0.69565217 0.95652174 0.95652174 0.91304348 0.95454545
0.86363636 0.81818182 1. 0.77272727]
mean value: 0.8756916996047431
key: train_recall
value: [0.93069307 0.87128713 0.92574257 0.93564356 0.96039604 0.91133005
0.87192118 0.89162562 0.90640394 0.87192118]
mean value: 0.9076964346680974
key: test_roc_auc
value: [0.86758893 0.82509881 0.93280632 0.91007905 0.93379447 0.91205534
0.88833992 0.86561265 1. 0.86462451]
mean value: 0.9
key: train_roc_auc
value: [0.91854851 0.92332829 0.9136102 0.92594986 0.91369556 0.90863532
0.91368336 0.91363459 0.89627128 0.90873287]
mean value: 0.9136089840511145
key: test_jcc
value: [0.76 0.66666667 0.88 0.84615385 0.875 0.84
0.79166667 0.75 1. 0.73913043]
mean value: 0.8148617614269789
key: train_jcc
value: [0.85067873 0.85024155 0.84234234 0.8630137 0.84716157 0.83333333
0.83490566 0.83796296 0.81415929 0.8271028 ]
mean value: 0.8400901944397646
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02700424 0.02184582 0.02058792 0.02255273 0.01992631 0.01929307
0.0199616 0.0216949 0.02253938 0.02359295]
mean value: 0.021899890899658204
key: score_time
value: [0.01251793 0.00972891 0.00979257 0.00993633 0.00901151 0.00906038
0.00911379 0.00917554 0.00938892 0.00946093]
mean value: 0.009718680381774902
key: test_mcc
value: [0.77865613 0.91106719 0.82506438 0.95643752 0.82213439 0.91485328
0.95652174 0.77821935 0.95643752 0.91452919]
mean value: 0.8813920675315654
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 0.95555556 0.91111111 0.97777778 0.91111111 0.95555556
0.97777778 0.88888889 0.97777778 0.95555556]
mean value: 0.94
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.95652174 0.91666667 0.9787234 0.91304348 0.95652174
0.97777778 0.88372093 0.97674419 0.95238095]
mean value: 0.9400989762770414
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.95652174 0.88 0.95833333 0.91304348 0.91666667
0.95652174 0.9047619 1. 1. ]
mean value: 0.9394939770374553
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86956522 0.95652174 0.95652174 1. 0.91304348 1.
1. 0.86363636 0.95454545 0.90909091]
mean value: 0.9422924901185771
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88932806 0.9555336 0.91007905 0.97727273 0.91106719 0.95652174
0.97826087 0.88833992 0.97727273 0.95454545]
mean value: 0.9398221343873517
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.91666667 0.84615385 0.95833333 0.84 0.91666667
0.95652174 0.79166667 0.95454545 0.90909091]
mean value: 0.8889645282253977
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12013102 0.11766171 0.11774349 0.11807728 0.11954403 0.11755204
0.13109112 0.13012242 0.13104153 0.13023996]
mean value: 0.12332046031951904
key: score_time
value: [0.01799774 0.0182426 0.01839256 0.01811171 0.01826453 0.02004719
0.02000332 0.01999044 0.01991343 0.01994133]
mean value: 0.019090485572814942
key: test_mcc
value: [0.82574419 0.64426877 0.91106719 0.78405645 0.78530224 0.8360602
0.73663511 0.64426877 0.91106719 0.82213439]
mean value: 0.7900604515356031
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.82222222 0.95555556 0.88888889 0.88888889 0.91111111
0.86666667 0.82222222 0.95555556 0.91111111]
mean value: 0.8933333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.82608696 0.95652174 0.89795918 0.88372093 0.91666667
0.86956522 0.81818182 0.95454545 0.90909091]
mean value: 0.8941429784525263
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 0.82608696 0.95652174 0.84615385 0.95 0.84615385
0.83333333 0.81818182 0.95454545 0.90909091]
mean value: 0.8892448855492334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.95652174 0.82608696 1.
0.90909091 0.81818182 0.95454545 0.90909091]
mean value: 0.9025691699604743
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91205534 0.82213439 0.9555336 0.88735178 0.89031621 0.91304348
0.86758893 0.82213439 0.9555336 0.91106719]
mean value: 0.8936758893280633
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.7037037 0.91666667 0.81481481 0.79166667 0.84615385
0.76923077 0.69230769 0.91304348 0.83333333]
mean value: 0.8114254304471695
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01040983 0.01041746 0.01033044 0.01143312 0.01089478 0.01075268
0.01148319 0.01071334 0.01060081 0.01162791]
mean value: 0.010866355895996094
key: score_time
value: [0.00902987 0.00913858 0.00908208 0.00915885 0.00940609 0.00922942
0.00991488 0.0096333 0.00902796 0.00981998]
mean value: 0.009344100952148438
key: test_mcc
value: [0.46930785 0.51185771 0.82506438 0.60000118 0.43557241 0.37774032
0.24655092 0.60000118 0.33824342 0.56604076]
mean value: 0.4970380107838396
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73333333 0.75555556 0.91111111 0.8 0.71111111 0.68888889
0.62222222 0.8 0.66666667 0.77777778]
mean value: 0.7466666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.72727273 0.75555556 0.91666667 0.80851064 0.68292683 0.66666667
0.56410256 0.79069767 0.61538462 0.79166667]
mean value: 0.7319450604300232
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76190476 0.77272727 0.88 0.79166667 0.77777778 0.7
0.64705882 0.80952381 0.70588235 0.73076923]
mean value: 0.7577310695840107
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.69565217 0.73913043 0.95652174 0.82608696 0.60869565 0.63636364
0.5 0.77272727 0.54545455 0.86363636]
mean value: 0.7144268774703557
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73418972 0.75592885 0.91007905 0.79940711 0.71343874 0.68774704
0.61956522 0.79940711 0.66403162 0.77964427]
mean value: 0.7463438735177865
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.57142857 0.60714286 0.84615385 0.67857143 0.51851852 0.5
0.39285714 0.65384615 0.44444444 0.65517241]
mean value: 0.5868135376756066
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.74629807 1.81791091 1.8328793 1.83209157 1.79719567 1.78587461
1.83592582 1.77011418 1.84957242 1.9802525 ]
mean value: 1.8248115062713623
key: score_time
value: [0.09639883 0.09250951 0.13806105 0.10098362 0.12009382 0.10071588
0.10475492 0.10313916 0.1076386 0.10528183]
mean value: 0.10695772171020508
key: test_mcc
value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174
0.82213439 0.77821935 1. 0.95643752]
mean value: 0.894147926437764
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778
0.91111111 0.88888889 1. 0.97777778]
mean value: 0.9466666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.95652174 0.93617021 0.9787234 0.90909091 0.97777778
0.90909091 0.88372093 1. 0.97674419]
mean value: 0.946117340172371
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174
0.90909091 0.9047619 1. 1. ]
mean value: 0.950882269904009
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 0.95652174 1. 0.86956522 1.
0.90909091 0.86363636 1. 0.95454545]
mean value: 0.9422924901185771
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93379447 0.9555336 0.93280632 0.97727273 0.91205534 0.97826087
0.91106719 0.88833992 1. 0.97727273]
mean value: 0.9466403162055336
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.91666667 0.88 0.95833333 0.83333333 0.95652174
0.83333333 0.79166667 1. 0.95454545]
mean value: 0.8999400527009223
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.19596362 1.18671298 1.15252948 1.52557898 1.15283728 0.99552059
2.06139922 0.94499493 1.02960014 0.97231007]
mean value: 1.2217447280883789
key: score_time
value: [0.16833639 0.15999269 0.18151951 0.1587429 0.12559843 0.14782834
0.18785882 0.23321199 0.17717195 0.24588728]
mean value: 0.17861483097076417
key: test_mcc
value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.95652174
0.82574419 0.77821935 1. 0.91452919]
mean value: 0.8814247933518943
key: train_mcc
value: [0.96049359 0.95061698 0.94568955 0.94078482 0.95556639 0.94569087
0.95066455 0.96544324 0.94078771 0.94078771]
mean value: 0.9496525422288566
key: test_accuracy
value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.97777778
0.91111111 0.88888889 1. 0.95555556]
mean value: 0.94
key: train_accuracy
value: [0.98024691 0.97530864 0.97283951 0.97037037 0.97777778 0.97283951
0.97530864 0.98271605 0.97037037 0.97037037]
mean value: 0.9748148148148148
key: test_fscore
value: [0.93333333 0.91304348 0.93617021 0.9787234 0.90909091 0.97777778
0.91304348 0.88372093 1. 0.95238095]
mean value: 0.9397284476358546
key: train_fscore
value: [0.98019802 0.97524752 0.97270471 0.97014925 0.97766749 0.97283951
0.97524752 0.98280098 0.97029703 0.97029703]
mean value: 0.9747449079854762
key: test_precision
value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95652174
0.875 0.9047619 1. 1. ]
mean value: 0.9431253529079616
key: train_precision
value: [0.98019802 0.97524752 0.97512438 0.975 0.9800995 0.97524752
0.9800995 0.98039216 0.97512438 0.97512438]
mean value: 0.9771657365473159
key: test_recall
value: [0.91304348 0.91304348 0.95652174 1. 0.86956522 1.
0.95454545 0.86363636 1. 0.90909091]
mean value: 0.9379446640316206
key: train_recall
value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335
0.97044335 0.98522167 0.96551724 0.96551724]
mean value: 0.9723479490806224
key: test_roc_auc
value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.97826087
0.91205534 0.88833992 1. 0.95454545]
mean value: 0.9400197628458498
key: train_roc_auc
value: [0.98024679 0.97530849 0.97283324 0.970358 0.97777155 0.97284544
0.97532068 0.98270985 0.97038238 0.97038238]
mean value: 0.9748158806028386
key: test_jcc
value: [0.875 0.84 0.88 0.95833333 0.83333333 0.95652174
0.84 0.79166667 1. 0.90909091]
mean value: 0.8883945981554677
key: train_jcc
value: [0.96116505 0.95169082 0.9468599 0.94202899 0.95631068 0.94711538
0.95169082 0.96618357 0.94230769 0.94230769]
mean value: 0.9507660603666303
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02524042 0.02612805 0.02330494 0.02426648 0.0341692 0.03412199
0.02433777 0.02824688 0.03730392 0.03393507]
mean value: 0.029105472564697265
key: score_time
value: [0.0212667 0.03293991 0.02683496 0.03234029 0.0220933 0.02196932
0.02211404 0.02257228 0.02409673 0.02418518]
mean value: 0.02504127025604248
key: test_mcc
value: [0.74605372 0.4229249 0.68972332 0.69404997 0.78530224 0.78530224
0.55841694 0.64426877 0.73559956 0.60637261]
mean value: 0.666801427789491
key: train_mcc
value: [0.7385111 0.69500224 0.73847923 0.75849711 0.72839898 0.76296152
0.77288136 0.7777832 0.72358281 0.73337398]
mean value: 0.7429471510582903
key: test_accuracy
value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889
0.77777778 0.82222222 0.86666667 0.8 ]
mean value: 0.8311111111111111
key: train_accuracy
value: [0.8691358 0.84691358 0.8691358 0.87901235 0.86419753 0.88148148
0.88641975 0.88888889 0.8617284 0.86666667]
mean value: 0.871358024691358
key: test_fscore
value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702
0.7826087 0.81818182 0.85714286 0.80851064]
mean value: 0.8313623230625146
key: train_fscore
value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734
0.88613861 0.88943489 0.86341463 0.86633663]
mean value: 0.8710820634544677
key: test_precision
value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95 0.84
0.75 0.81818182 0.9 0.76 ]
mean value: 0.8364151637835848
key: train_precision
value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734
0.89054726 0.8872549 0.85507246 0.87064677]
mean value: 0.8723492167626019
key: test_recall
value: [0.7826087 0.69565217 0.82608696 0.91304348 0.82608696 0.95454545
0.81818182 0.81818182 0.81818182 0.86363636]
mean value: 0.8316205533596838
key: train_recall
value: [0.88118812 0.81683168 0.85643564 0.8960396 0.86138614 0.8817734
0.8817734 0.89162562 0.87192118 0.86206897]
mean value: 0.8701043749695166
key: test_roc_auc
value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621
0.77865613 0.82213439 0.86561265 0.8013834 ]
mean value: 0.8316205533596839
key: train_roc_auc
value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076
0.88643125 0.88888211 0.86170317 0.86667805]
mean value: 0.8713529727356972
key: test_jcc
value: [0.75 0.55172414 0.73076923 0.75 0.79166667 0.80769231
0.64285714 0.69230769 0.75 0.67857143]
mean value: 0.7145588606795503
key: train_jcc
value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626
0.79555556 0.80088496 0.75965665 0.76419214]
mean value: 0.7718539151085453
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [5.04259777 5.1556654 4.94334316 4.85878825 4.41032028 4.46716976
1.56981826 3.87288809 5.31235576 4.7354219 ]
mean value: 4.436836862564087
key: score_time
value: [0.02622247 0.01853395 0.02413154 0.02783108 0.02140975 0.02586436
0.0147388 0.01681423 0.02183342 0.02106977]
mean value: 0.021844935417175294
key: test_mcc
value: [0.82213439 0.91106719 0.95643752 1. 0.86758893 0.91485328
0.95652174 0.77821935 1. 0.95643752]
mean value: 0.9163259916823262
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 1. 0.93333333 0.95555556
0.97777778 0.88888889 1. 0.97777778]
mean value: 0.9577777777777777
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91304348 0.95652174 0.9787234 1. 0.93333333 0.95652174
0.97777778 0.88372093 1. 0.97674419]
mean value: 0.9576386588167239
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.95652174 0.95833333 1. 0.95454545 0.91666667
0.95652174 0.9047619 1. 1. ]
mean value: 0.9560394315829098
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 1. 1. 0.91304348 1.
1. 0.86363636 1. 0.95454545]
mean value: 0.9600790513833992
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91106719 0.9555336 0.97727273 1. 0.93379447 0.95652174
0.97826087 0.88833992 1. 0.97727273]
mean value: 0.957806324110672
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84 0.91666667 0.95833333 1. 0.875 0.91666667
0.95652174 0.79166667 1. 0.95454545]
mean value: 0.9209400527009223
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0814774 0.10963106 0.09837937 0.11255074 0.09978104 0.06058836
0.06621599 0.05549955 0.05097103 0.07600141]
mean value: 0.08110959529876709
key: score_time
value: [0.03861284 0.04575562 0.03341079 0.03420973 0.02950549 0.01280618
0.02309275 0.01282167 0.01282978 0.02283335]
mean value: 0.026587820053100585
key: test_mcc
value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.64752602
0.73663511 0.69404997 0.82213439 0.73559956]
mean value: 0.7535902709223832
key: train_mcc
value: [0.92103402 0.91129269 0.91111057 0.92103017 0.93581427 0.92117074
0.91605902 0.93126766 0.92602981 0.89630533]
mean value: 0.9191114276703818
key: test_accuracy
value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.82222222
0.86666667 0.84444444 0.91111111 0.86666667]
mean value: 0.8755555555555555
key: train_accuracy
value: [0.96049383 0.95555556 0.95555556 0.96049383 0.96790123 0.96049383
0.95802469 0.9654321 0.96296296 0.94814815]
mean value: 0.9595061728395062
key: test_fscore
value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.82608696
0.86956522 0.82926829 0.90909091 0.85714286]
mean value: 0.8764860236970523
key: train_fscore
value: [0.96059113 0.95588235 0.95544554 0.960199 0.96790123 0.960199
0.95823096 0.96601942 0.96277916 0.94840295]
mean value: 0.9595650755455886
key: test_precision
value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.79166667
0.83333333 0.89473684 0.90909091 0.9 ]
mean value: 0.8724844777647981
key: train_precision
value: [0.95588235 0.94660194 0.95544554 0.965 0.96551724 0.96984925
0.95588235 0.95215311 0.97 0.94607843]
mean value: 0.9582410221215243
key: test_recall
value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.86363636
0.90909091 0.77272727 0.90909091 0.81818182]
mean value: 0.8837944664031621
key: train_recall
value: [0.96534653 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892
0.96059113 0.98029557 0.95566502 0.95073892]
mean value: 0.9609910744769058
key: test_roc_auc
value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.82312253
0.86758893 0.84288538 0.91106719 0.86561265]
mean value: 0.875197628458498
key: train_roc_auc
value: [0.96050578 0.95557967 0.95555528 0.96048139 0.96790714 0.96051797
0.95801834 0.96539531 0.96298103 0.94814174]
mean value: 0.9595083646295663
key: test_jcc
value: [0.875 0.75 0.84 0.75 0.84 0.7037037
0.76923077 0.70833333 0.83333333 0.75 ]
mean value: 0.781960113960114
key: train_jcc
value: [0.92417062 0.91549296 0.91469194 0.92344498 0.93779904 0.92344498
0.91981132 0.9342723 0.92822967 0.90186916]
mean value: 0.9223226957377971
MCC on Blind test: 0.72
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02436852 0.01083708 0.01080036 0.01135302 0.01035428 0.01042557
0.01047254 0.01276612 0.01002574 0.01004267]
mean value: 0.0121445894241333
key: score_time
value: [0.01228738 0.009691 0.00953341 0.00930977 0.00900745 0.00928497
0.00902605 0.01075983 0.00912833 0.00952101]
mean value: 0.00975492000579834
key: test_mcc
value: [0.78530224 0.46640316 0.82213439 0.82213439 0.78530224 0.77865613
0.55841694 0.64752602 0.79670588 0.60000118]
mean value: 0.7062582554009066
key: train_mcc
value: [0.72859901 0.7001606 0.67485592 0.74815266 0.71871879 0.75811526
0.71448494 0.76814813 0.73836061 0.71961678]
mean value: 0.726921270950553
key: test_accuracy
value: [0.88888889 0.73333333 0.91111111 0.91111111 0.88888889 0.88888889
0.77777778 0.82222222 0.88888889 0.8 ]
mean value: 0.851111111111111
key: train_accuracy
value: [0.86419753 0.84938272 0.83703704 0.87407407 0.85925926 0.87901235
0.85679012 0.88395062 0.8691358 0.85925926]
mean value: 0.8632098765432099
key: test_fscore
value: [0.88372093 0.73913043 0.91304348 0.91304348 0.88372093 0.88888889
0.7826087 0.82608696 0.87179487 0.79069767]
mean value: 0.8492736339045742
key: train_fscore
value: [0.86215539 0.84398977 0.83248731 0.87344913 0.85714286 0.87841191
0.85353535 0.88279302 0.86848635 0.8556962 ]
mean value: 0.8608147293143978
key: test_precision
value: [0.95 0.73913043 0.91304348 0.91304348 0.95 0.86956522
0.75 0.79166667 1. 0.80952381]
mean value: 0.8685973084886128
key: train_precision
value: [0.87309645 0.87301587 0.85416667 0.87562189 0.8680203 0.885
0.87564767 0.89393939 0.875 0.88020833]
mean value: 0.8753716577165349
key: test_recall
value: [0.82608696 0.73913043 0.91304348 0.91304348 0.82608696 0.90909091
0.81818182 0.86363636 0.77272727 0.77272727]
mean value: 0.8353754940711462
key: train_recall
value: [0.85148515 0.81683168 0.81188119 0.87128713 0.84653465 0.87192118
0.83251232 0.87192118 0.86206897 0.83251232]
mean value: 0.8468955762571331
key: test_roc_auc
value: [0.89031621 0.73320158 0.91106719 0.91106719 0.89031621 0.88932806
0.77865613 0.82312253 0.88636364 0.79940711]
mean value: 0.8512845849802372
key: train_roc_auc
value: [0.86416622 0.84930254 0.83697508 0.87406721 0.85922792 0.8790299
0.85685022 0.88398039 0.86915329 0.85932546]
mean value: 0.8632078232453787
key: test_jcc
value: [0.79166667 0.5862069 0.84 0.84 0.79166667 0.8
0.64285714 0.7037037 0.77272727 0.65384615]
mean value: 0.7422674503019331
key: train_jcc
value: [0.75770925 0.7300885 0.71304348 0.7753304 0.75 0.78318584
0.74449339 0.79017857 0.76754386 0.74778761]
mean value: 0.7559360895888796
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01528907 0.02077127 0.01836395 0.01844668 0.01875925 0.02194405
0.01897907 0.02040768 0.02031064 0.02082705]
mean value: 0.019409871101379393
key: score_time
value: [0.00912023 0.01180029 0.01194692 0.01217914 0.01215005 0.01223612
0.01204348 0.01269436 0.01220083 0.01210046]
mean value: 0.011847186088562011
key: test_mcc
value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645
0.70780516 0.64752602 0.82213439 0.70501339]
mean value: 0.7297153566397614
key: train_mcc
value: [0.86377146 0.84895551 0.82265468 0.87431362 0.80684222 0.81706101
0.86843671 0.82696893 0.88164702 0.87837337]
mean value: 0.8489024517724476
key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889
0.84444444 0.82222222 0.91111111 0.84444444]
mean value: 0.86
key: train_accuracy
value: [0.9308642 0.92098765 0.90864198 0.93580247 0.8962963 0.90123457
0.93333333 0.90864198 0.94074074 0.9382716 ]
mean value: 0.9214814814814815
key: test_fscore
value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878
0.85714286 0.82608696 0.90909091 0.82051282]
mean value: 0.8535362607590927
key: train_fscore
value: [0.92820513 0.91534392 0.91334895 0.93298969 0.8852459 0.89130435
0.93556086 0.91533181 0.94146341 0.93638677]
mean value: 0.9195180779922804
key: test_precision
value: [0.95 0.85714286 0.91666667 0.86956522 0.93333333 0.94736842
0.77777778 0.79166667 0.90909091 0.94117647]
mean value: 0.8893788319710382
key: train_precision
value: [0.96276596 0.98295455 0.86666667 0.97311828 0.98780488 0.99393939
0.90740741 0.85470085 0.93236715 0.96842105]
mean value: 0.9430146185624383
key: test_recall
value: [0.82608696 0.7826087 0.95652174 0.86956522 0.60869565 0.81818182
0.95454545 0.86363636 0.90909091 0.72727273]
mean value: 0.8316205533596838
key: train_recall
value: [0.8960396 0.85643564 0.96534653 0.8960396 0.8019802 0.80788177
0.96551724 0.98522167 0.95073892 0.90640394]
mean value: 0.9031605130956446
key: test_roc_auc
value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178
0.84683794 0.82312253 0.91106719 0.84189723]
mean value: 0.8604743083003953
key: train_roc_auc
value: [0.93077842 0.92082866 0.90878164 0.93570453 0.89606399 0.90146564
0.93325367 0.90845242 0.94071599 0.93835049]
mean value: 0.9214395454323757
key: test_jcc
value: [0.79166667 0.69230769 0.88 0.76923077 0.58333333 0.7826087
0.75 0.7037037 0.83333333 0.69565217]
mean value: 0.7481836368140716
key: train_jcc
value: [0.86602871 0.84390244 0.84051724 0.87439614 0.79411765 0.80392157
0.87892377 0.84388186 0.88940092 0.88038278]
mean value: 0.8515473059624479
MCC on Blind test: 0.75
Accuracy on Blind test: 0.87
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.023592 0.01849461 0.01916718 0.02027512 0.01857185 0.01852775
0.01686931 0.01843333 0.02005339 0.01996136]
mean value: 0.019394588470458985
key: score_time
value: [0.00936127 0.01204491 0.01203346 0.01205182 0.01208186 0.01245189
0.01205373 0.01204967 0.01202798 0.01207948]
mean value: 0.011823606491088868
key: test_mcc
value: [0.76206649 0.38361073 0.69583743 0.73320158 0.73320158 0.87476705
0.70780516 0.60637261 0.72645449 0.82574419]
mean value: 0.7049061319646861
key: train_mcc
value: [0.71591321 0.28354195 0.87676217 0.89347743 0.88642848 0.89639025
0.81462126 0.75056333 0.63579921 0.85520525]
mean value: 0.760870254618254
key: test_accuracy
value: [0.86666667 0.62222222 0.84444444 0.86666667 0.86666667 0.93333333
0.84444444 0.8 0.84444444 0.91111111]
mean value: 0.84
key: train_accuracy
value: [0.84197531 0.57530864 0.93580247 0.94567901 0.94320988 0.94814815
0.9037037 0.86419753 0.79012346 0.92592593]
mean value: 0.8674074074074074
key: test_fscore
value: [0.85 0.4137931 0.8372093 0.86956522 0.86956522 0.93617021
0.85714286 0.80851064 0.8627451 0.91304348]
mean value: 0.8217745125063238
key: train_fscore
value: [0.81395349 0.25862069 0.93193717 0.94358974 0.94292804 0.94865526
0.90993072 0.87912088 0.82617587 0.92924528]
mean value: 0.8384157138013564
key: test_precision
value: [1. 1. 0.9 0.86956522 0.86956522 0.88
0.77777778 0.76 0.75862069 0.875 ]
mean value: 0.8690528902215559
key: train_precision
value: [0.98591549 1. 0.98888889 0.9787234 0.94527363 0.94174757
0.85652174 0.79365079 0.70629371 0.89140271]
mean value: 0.9088417944765346
key: test_recall
value: [0.73913043 0.26086957 0.7826087 0.86956522 0.86956522 1.
0.95454545 0.86363636 1. 0.95454545]
mean value: 0.8294466403162055
key: train_recall
value: [0.69306931 0.14851485 0.88118812 0.91089109 0.94059406 0.95566502
0.97044335 0.98522167 0.99507389 0.97044335]
mean value: 0.8451104716382969
key: test_roc_auc
value: [0.86956522 0.63043478 0.8458498 0.86660079 0.86660079 0.93478261
0.84683794 0.8013834 0.84782609 0.91205534]
mean value: 0.842193675889328
key: train_roc_auc
value: [0.84160855 0.57425743 0.93566795 0.94559333 0.94320343 0.94812954
0.90353851 0.86389797 0.78961615 0.92581573]
mean value: 0.8671328586060576
key: test_jcc
value: [0.73913043 0.26086957 0.72 0.76923077 0.76923077 0.88
0.75 0.67857143 0.75862069 0.84 ]
mean value: 0.7165653656688139
key: train_jcc
value: [0.68627451 0.14851485 0.87254902 0.89320388 0.89201878 0.90232558
0.83474576 0.78431373 0.70383275 0.86784141]
mean value: 0.7585620275637062
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18216228 0.1695621 0.17470765 0.17133498 0.17510653 0.1782279
0.17793107 0.173594 0.16895962 0.1765089 ]
mean value: 0.17480950355529784
key: score_time
value: [0.01529431 0.01603556 0.01582551 0.01555681 0.01668692 0.01690388
0.01652122 0.0161798 0.01550436 0.01662135]
mean value: 0.016112971305847167
key: test_mcc
value: [0.86758893 0.82213439 0.95643752 1. 0.86758893 0.87476705
0.95652174 0.86732843 1. 1. ]
mean value: 0.9212366993733405
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.91111111 0.97777778 1. 0.93333333 0.93333333
0.97777778 0.93333333 1. 1. ]
mean value: 0.96
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.91304348 0.9787234 1. 0.93333333 0.93617021
0.97777778 0.93023256 1. 1. ]
mean value: 0.9602614097866126
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95454545 0.91304348 0.95833333 1. 0.95454545 0.88
0.95652174 0.95238095 1. 1. ]
mean value: 0.95693704121965
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.91304348 1. 1. 0.91304348 1.
1. 0.90909091 1. 1. ]
mean value: 0.9648221343873518
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93379447 0.91106719 0.97727273 1. 0.93379447 0.93478261
0.97826087 0.93280632 1. 1. ]
mean value: 0.9601778656126483
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.84 0.95833333 1. 0.875 0.88
0.95652174 0.86956522 1. 1. ]
mean value: 0.9254420289855072
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05877137 0.06073856 0.05816627 0.06933212 0.05485106 0.0548265
0.07156682 0.08174944 0.06895924 0.07406449]
mean value: 0.06530258655548096
key: score_time
value: [0.01848102 0.02689838 0.02581835 0.02910113 0.01847243 0.02071047
0.03578973 0.02482224 0.03958559 0.02383661]
mean value: 0.02635159492492676
key: test_mcc
value: [0.82213439 0.91106719 0.91106719 1. 0.86758893 0.91485328
0.91485328 0.82213439 1. 0.87406293]
mean value: 0.9037761587267465
key: train_mcc
value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413
0.98024679 0.99507389 0.98029509 0.98519729]
mean value: 0.9837523754632594
key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 1. 0.93333333 0.95555556
0.95555556 0.91111111 1. 0.93333333]
mean value: 0.9511111111111111
key: train_accuracy
value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346
0.99012346 0.99753086 0.99012346 0.99259259]
mean value: 0.9918518518518519
key: test_fscore
value: [0.91304348 0.95652174 0.95652174 1. 0.93333333 0.95652174
0.95652174 0.90909091 1. 0.92682927]
mean value: 0.9508383945499534
key: train_fscore
value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608
0.99014778 0.99753086 0.99009901 0.99259259]
mean value: 0.9918270548106813
key: test_precision
value: [0.91304348 0.95652174 0.95652174 1. 0.95454545 0.91666667
0.91666667 0.90909091 1. 1. ]
mean value: 0.9523056653491436
key: train_precision
value: [0.99009901 0.99502488 0.99502488 1. 1. 0.98536585
0.99014778 1. 0.99502488 0.9950495 ]
mean value: 0.9945736778626925
key: test_recall
value: [0.91304348 0.95652174 0.95652174 1. 0.91304348 1.
1. 0.90909091 1. 0.86363636]
mean value: 0.9511857707509881
key: train_recall
value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389
0.99014778 0.99507389 0.98522167 0.99014778]
mean value: 0.9891308588986978
key: test_roc_auc
value: [0.91106719 0.9555336 0.9555336 1. 0.93379447 0.95652174
0.95652174 0.91106719 1. 0.93181818]
mean value: 0.9511857707509882
key: train_roc_auc
value: [0.9901234 0.99258645 0.99258645 0.9950495 0.98762376 0.9901112
0.9901234 0.99753695 0.99013559 0.99259864]
mean value: 0.9918475345071454
key: test_jcc
value: [0.84 0.91666667 0.91666667 1. 0.875 0.91666667
0.91666667 0.83333333 1. 0.86363636]
mean value: 0.9078636363636363
key: train_jcc
value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252
0.9804878 0.99507389 0.98039216 0.98529412]
mean value: 0.9838012536555218
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.11244583 0.14630103 0.19385266 0.16284895 0.19661665 0.19063306
0.19858932 0.12693048 0.16871762 0.17700362]
mean value: 0.16739392280578613
key: score_time
value: [0.04026365 0.02383375 0.02751327 0.02340198 0.03419042 0.0237093
0.02365375 0.02253604 0.02923608 0.02369428]
mean value: 0.02720324993133545
key: test_mcc
value: [0.56604076 0.55841694 0.63358389 0.6133209 0.73663511 0.69156407
0.37747036 0.55533597 0.687125 0.64613475]
mean value: 0.6065627735892382
key: train_mcc
value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376
0.99017193 0.99017193 0.99507389 0.99017193]
mean value: 0.9887104432367875
key: test_accuracy
value: [0.77777778 0.77777778 0.8 0.8 0.86666667 0.82222222
0.68888889 0.77777778 0.82222222 0.82222222]
mean value: 0.7955555555555556
key: train_accuracy
value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259
0.99506173 0.99506173 0.99753086 0.99506173]
mean value: 0.994320987654321
key: test_fscore
value: [0.76190476 0.77272727 0.76923077 0.82352941 0.86363636 0.84615385
0.68181818 0.77272727 0.77777778 0.80952381]
mean value: 0.7879029467264761
key: train_fscore
value: [0.99502488 0.99502488 0.9925187 0.9925187 0.9925187 0.99255583
0.9950495 0.9950495 0.99753086 0.9950495 ]
mean value: 0.9942841071283992
key: test_precision
value: [0.84210526 0.80952381 0.9375 0.75 0.9047619 0.73333333
0.68181818 0.77272727 1. 0.85 ]
mean value: 0.8281769765322397
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1.
0.68181818 0.77272727 0.63636364 0.77272727]
mean value: 0.7689723320158103
key: train_recall
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
0.99014778 0.99014778 0.99507389 0.99014778]
mean value: 0.9886382480612593
key: test_roc_auc
value: [0.77964427 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696
0.68873518 0.77766798 0.81818182 0.82114625]
mean value: 0.7958498023715415
key: train_roc_auc
value: [0.9950495 0.9950495 0.99257426 0.99257426 0.99257426 0.99261084
0.99507389 0.99507389 0.99753695 0.99507389]
mean value: 0.9943191240306297
key: test_jcc
value: [0.61538462 0.62962963 0.625 0.7 0.76 0.73333333
0.51724138 0.62962963 0.63636364 0.68 ]
mean value: 0.652658222365119
key: train_jcc
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
0.99014778 0.99014778 0.99507389 0.99014778]
mean value: 0.9886382480612593
MCC on Blind test: 0.59
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.66729116 0.64972425 0.65732384 0.6665225 0.65949416 0.65429211
0.65473723 0.65853596 0.664325 0.65511894]
mean value: 0.6587365150451661
key: score_time
value: [0.01024175 0.00957489 0.00951242 0.00985265 0.00966835 0.00935936
0.00952435 0.00940204 0.00994349 0.00998735]
mean value: 0.009706664085388183
key: test_mcc
value: [0.82213439 0.86732843 0.95643752 1. 0.82574419 0.91485328
0.91485328 0.77821935 1. 0.95643752]
mean value: 0.9036007956373757
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.93333333 0.97777778 1. 0.91111111 0.95555556
0.95555556 0.88888889 1. 0.97777778]
mean value: 0.9511111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91304348 0.93617021 0.9787234 1. 0.90909091 0.95652174
0.95652174 0.88372093 1. 0.97674419]
mean value: 0.9510536598912994
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.91666667 0.95833333 1. 0.95238095 0.91666667
0.91666667 0.9047619 1. 1. ]
mean value: 0.947851966873706
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 1. 1. 0.86956522 1.
1. 0.86363636 1. 0.95454545]
mean value: 0.9557312252964427
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91106719 0.93280632 0.97727273 1. 0.91205534 0.95652174
0.95652174 0.88833992 1. 0.97727273]
mean value: 0.9511857707509881
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84 0.88 0.95833333 1. 0.83333333 0.91666667
0.91666667 0.79166667 1. 0.95454545]
mean value: 0.9091212121212121
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.06929517 0.06151557 0.07104921 0.03313804 0.04500461 0.03204679
0.03788662 0.03866816 0.03848624 0.04184365]
mean value: 0.04689340591430664
key: score_time
value: [0.01687455 0.02114534 0.02216387 0.01866746 0.01421213 0.0171454
0.01545548 0.01302433 0.01309681 0.01861858]
mean value: 0.017040395736694337
key: test_mcc
value: [0.65604724 0.35497208 0.2540839 0.46640316 0.21191154 0.11393242
0.33797818 0.15717365 0.4000988 0.06320859]
mean value: 0.3015809563232226
key: train_mcc
value: [0.95177249 0.72466772 0.97079432 0.90113034 0.8354634 0.62435788
0.97541464 0.76507358 0.89222145 0.77839025]
mean value: 0.8419286071129487
key: test_accuracy
value: [0.82222222 0.66666667 0.62222222 0.73333333 0.6 0.55555556
0.66666667 0.57777778 0.66666667 0.53333333]
mean value: 0.6444444444444444
key: train_accuracy
value: [0.97530864 0.84444444 0.98518519 0.94814815 0.91111111 0.78024691
0.98765432 0.8691358 0.94320988 0.88395062]
mean value: 0.9128395061728395
key: test_fscore
value: [0.80952381 0.61538462 0.58536585 0.73913043 0.55 0.41176471
0.68085106 0.48648649 0.51612903 0.4 ]
mean value: 0.5794636001806261
key: train_fscore
value: [0.97461929 0.81524927 0.98492462 0.94516971 0.90217391 0.7192429
0.98777506 0.84985836 0.93994778 0.87399464]
mean value: 0.8992955544177024
key: test_precision
value: [0.89473684 0.75 0.66666667 0.73913043 0.64705882 0.58333333
0.64 0.6 0.88888889 0.53846154]
mean value: 0.694827652776771
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.98058252 1. 1. 0.95882353]
mean value: 0.9939406053683609
key: test_recall
value: [0.73913043 0.52173913 0.52173913 0.73913043 0.47826087 0.31818182
0.72727273 0.40909091 0.36363636 0.31818182]
mean value: 0.5136363636363637
key: train_recall
value: [0.95049505 0.68811881 0.97029703 0.8960396 0.82178218 0.56157635
0.99507389 0.73891626 0.88669951 0.80295567]
mean value: 0.8311954348144174
key: test_roc_auc
value: [0.82411067 0.66996047 0.62450593 0.73320158 0.6027668 0.55039526
0.66798419 0.57411067 0.66007905 0.52865613]
mean value: 0.6435770750988142
key: train_roc_auc
value: [0.97524752 0.84405941 0.98514851 0.9480198 0.91089109 0.78078818
0.98763596 0.86945813 0.94334975 0.8841511 ]
mean value: 0.9128749451299809
key: test_jcc
value: [0.68 0.44444444 0.4137931 0.5862069 0.37931034 0.25925926
0.51612903 0.32142857 0.34782609 0.25 ]
mean value: 0.41983977391744476
key: train_jcc
value: [0.95049505 0.68811881 0.97029703 0.8960396 0.82178218 0.56157635
0.97584541 0.73891626 0.88669951 0.77619048]
mean value: 0.8265960678312423
MCC on Blind test: 0.49
Accuracy on Blind test: 0.74
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02726126 0.04403973 0.06417942 0.03625083 0.05212307 0.04398036
0.05202127 0.05687022 0.04713082 0.05117822]
mean value: 0.04750351905822754
key: score_time
value: [0.02289844 0.02290368 0.03534818 0.02457857 0.02440238 0.01246762
0.02006912 0.02470255 0.02361917 0.02284813]
mean value: 0.023383784294128417
key: test_mcc
value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
0.73663511 0.68911026 0.95652174 0.73559956]
mean value: 0.7876062675297144
key: train_mcc
value: [0.86211613 0.8717805 0.87664317 0.86667805 0.89175679 0.87164354
0.871768 0.871768 0.86176621 0.85704185]
mean value: 0.8702962246613828
key: test_accuracy
value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
0.86666667 0.84444444 0.97777778 0.86666667]
mean value: 0.8933333333333333
key: train_accuracy
value: [0.9308642 0.93580247 0.9382716 0.93333333 0.94567901 0.93580247
0.93580247 0.93580247 0.9308642 0.92839506]
mean value: 0.9350617283950617
key: test_fscore
value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
0.86956522 0.8372093 0.97777778 0.85714286]
mean value: 0.8931868862110025
key: train_fscore
value: [0.93170732 0.93627451 0.93857494 0.93333333 0.94634146 0.93627451
0.93658537 0.93658537 0.93137255 0.92944039]
mean value: 0.9356489742025249
key: test_precision
value: [0.90909091 0.875 0.91666667 0.86956522 0.91304348 0.86956522
0.83333333 0.85714286 0.95652174 0.9 ]
mean value: 0.8899929418407679
key: train_precision
value: [0.91826923 0.92718447 0.93170732 0.93103448 0.93269231 0.93170732
0.92753623 0.92753623 0.92682927 0.91826923]
mean value: 0.9272766084215948
key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
0.90909091 0.81818182 1. 0.81818182]
mean value: 0.8976284584980238
key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867
0.94581281 0.94581281 0.93596059 0.9408867 ]
mean value: 0.9442032873238062
key: test_roc_auc
value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
0.86758893 0.84387352 0.97826087 0.86561265]
mean value: 0.8932806324110671
key: train_roc_auc
value: [0.93090036 0.93582646 0.93828952 0.93333902 0.94571526 0.93578988
0.93577769 0.93577769 0.93085158 0.92836414]
mean value: 0.9350631614885626
key: test_jcc
value: [0.8 0.80769231 0.88 0.76923077 0.84 0.8
0.76923077 0.72 0.95652174 0.75 ]
mean value: 0.8092675585284281
key: train_jcc
value: [0.87214612 0.88018433 0.88425926 0.875 0.89814815 0.88018433
0.88073394 0.88073394 0.87155963 0.86818182]
mean value: 0.8791131530840937
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.45930314 0.3597424 0.36897445 0.41965199 0.49196625 0.47425413
0.46620059 0.42436361 0.34532022 0.39895368]
mean value: 0.4208730459213257
key: score_time
value: [0.02300835 0.02374816 0.02298093 0.02300811 0.0216465 0.02457333
0.01253843 0.0218122 0.02498055 0.02483153]
mean value: 0.0223128080368042
key: test_mcc
value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
0.73663511 0.64613475 0.95652174 0.73559956]
mean value: 0.7833087165986725
key: train_mcc
value: [0.86211613 0.8717805 0.87664317 0.86667805 0.93581427 0.87164354
0.80250226 0.92620337 0.81237958 0.85704185]
mean value: 0.8682802726942564
key: test_accuracy
value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
0.86666667 0.82222222 0.97777778 0.86666667]
mean value: 0.8911111111111111
key: train_accuracy
value: [0.9308642 0.93580247 0.9382716 0.93333333 0.96790123 0.93580247
0.90123457 0.96296296 0.90617284 0.92839506]
mean value: 0.9340740740740741
key: test_fscore
value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
0.86956522 0.80952381 0.97777778 0.85714286]
mean value: 0.8904183369308254
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.93170732 0.93627451 0.93857494 0.93333333 0.96790123 0.93627451
0.90196078 0.96350365 0.90686275 0.92944039]
mean value: 0.9345833411498392
key: test_precision
value: [0.90909091 0.875 0.91666667 0.86956522 0.91304348 0.86956522
0.83333333 0.85 0.95652174 0.9 ]
mean value: 0.8892786561264822
key: train_precision
value: [0.91826923 0.92718447 0.93170732 0.93103448 0.96551724 0.93170732
0.89756098 0.95192308 0.90243902 0.91826923]
mean value: 0.9275612362765229
key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
0.90909091 0.77272727 1. 0.81818182]
mean value: 0.8930830039525691
key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.97029703 0.9408867
0.90640394 0.97536946 0.91133005 0.9408867 ]
mean value: 0.9417451104716383
key: test_roc_auc
value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
0.86758893 0.82114625 0.97826087 0.86561265]
mean value: 0.8910079051383399
key: train_roc_auc
value: [0.93090036 0.93582646 0.93828952 0.93333902 0.96790714 0.93578988
0.90122177 0.96293225 0.90616007 0.92836414]
mean value: 0.9340730624786616
key: test_jcc
value: [0.8 0.80769231 0.88 0.76923077 0.84 0.8
0.76923077 0.68 0.95652174 0.75 ]
mean value: 0.8052675585284281
key: train_jcc
value: [0.87214612 0.88018433 0.88425926 0.875 0.93779904 0.88018433
0.82142857 0.92957746 0.82959641 0.86818182]
mean value: 0.8778357351592567
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04131389 0.09903002 0.04775429 0.06405687 0.17856431 0.06696653
0.1108768 0.04235101 0.0490725 0.04135633]
mean value: 0.0741342544555664
key: score_time
value: [0.01273394 0.02248192 0.02041554 0.01248813 0.01281857 0.01917338
0.01525044 0.01229382 0.01236773 0.01228189]
mean value: 0.015230536460876465
key: test_mcc
value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
0.69583743 0.68911026 0.95652174 0.82213439]
mean value: 0.7925409622270596
key: train_mcc
value: [0.86190245 0.8716498 0.86172755 0.86177295 0.87664317 0.85188889
0.86188899 0.87680228 0.85679795 0.86176621]
mean value: 0.8642840257675023
key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
0.84444444 0.84444444 0.97777778 0.91111111]
mean value: 0.8955555555555555
key: train_accuracy
value: [0.9308642 0.93580247 0.9308642 0.9308642 0.9382716 0.92592593
0.9308642 0.9382716 0.92839506 0.9308642 ]
mean value: 0.9320987654320988
key: test_fscore
value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
0.85106383 0.8372093 0.97777778 0.90909091]
mean value: 0.8966281710028886
key: train_fscore
value: [0.93137255 0.93596059 0.93069307 0.93103448 0.93857494 0.92647059
0.93170732 0.93917275 0.92874693 0.93137255]
mean value: 0.9325105763259832
key: test_precision
value: [0.90909091 0.82608696 0.91666667 0.875 0.95454545 0.875
0.8 0.85714286 0.95652174 0.90909091]
mean value: 0.887914549218897
key: train_precision
value: [0.9223301 0.93137255 0.93069307 0.92647059 0.93170732 0.92195122
0.92270531 0.92788462 0.92647059 0.92682927]
mean value: 0.9268414626156831
key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
0.90909091 0.81818182 1. 0.90909091]
mean value: 0.9069169960474308
key: train_recall
value: [0.94059406 0.94059406 0.93069307 0.93564356 0.94554455 0.93103448
0.9408867 0.95073892 0.93103448 0.93596059]
mean value: 0.9382724479344486
key: test_roc_auc
value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
0.8458498 0.84387352 0.97826087 0.91106719]
mean value: 0.8957509881422925
key: train_roc_auc
value: [0.93088816 0.93581427 0.93086378 0.93087597 0.93828952 0.92591328
0.93083939 0.93824075 0.92838853 0.93085158]
mean value: 0.9320965224601278
key: test_jcc
value: [0.8 0.7037037 0.88 0.80769231 0.875 0.84
0.74074074 0.72 0.95652174 0.83333333]
mean value: 0.815699182460052
key: train_jcc
value: [0.87155963 0.87962963 0.87037037 0.87096774 0.88425926 0.8630137
0.87214612 0.8853211 0.86697248 0.87155963]
mean value: 0.8735799662583039
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [2.01325941 1.78600645 1.50203204 1.71028256 1.41295767 1.62343645
1.50840998 1.89245319 1.44183517 1.61304617]
mean value: 1.6503719091415405
key: score_time
value: [0.01474905 0.01609755 0.02074885 0.02179193 0.01591682 0.01258969
0.01645231 0.02142692 0.01871753 0.01511955]
mean value: 0.017361021041870116
key: test_mcc
value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419
0.73663511 0.68911026 1. 0.77821935]
mean value: 0.8015855339402445
key: train_mcc
value: [0.8965753 0.89630533 0.89630786 0.89139819 0.90627515 0.82225691
0.89630533 0.91128405 0.88164702 0.89152603]
mean value: 0.8889881173544825
key: test_accuracy
value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111
0.86666667 0.84444444 1. 0.88888889]
mean value: 0.9
key: train_accuracy
value: [0.94814815 0.94814815 0.94814815 0.94567901 0.95308642 0.91111111
0.94814815 0.95555556 0.94074074 0.94567901]
mean value: 0.9444444444444444
key: test_fscore
value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348
0.86956522 0.8372093 1. 0.88372093]
mean value: 0.9002571810221919
key: train_fscore
value: [0.94865526 0.94789082 0.94814815 0.94527363 0.95331695 0.91176471
0.94840295 0.95609756 0.94146341 0.94634146]
mean value: 0.9447354902197866
key: test_precision
value: [0.95238095 0.83333333 0.91666667 0.875 0.95238095 0.875
0.83333333 0.85714286 1. 0.9047619 ]
mean value: 0.9
key: train_precision
value: [0.93719807 0.95024876 0.94581281 0.95 0.94634146 0.90731707
0.94607843 0.9468599 0.93236715 0.93719807]
mean value: 0.939942172046439
key: test_recall
value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545
0.90909091 0.81818182 1. 0.86363636]
mean value: 0.9023715415019763
key: train_recall
value: [0.96039604 0.94554455 0.95049505 0.94059406 0.96039604 0.91625616
0.95073892 0.96551724 0.95073892 0.95566502]
mean value: 0.9496341998731893
key: test_roc_auc
value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534
0.86758893 0.84387352 1. 0.88833992]
mean value: 0.900098814229249
key: train_roc_auc
value: [0.94817832 0.94814174 0.94815393 0.94566649 0.95310442 0.91109838
0.94814174 0.9555309 0.94071599 0.94565429]
mean value: 0.944438618738721
key: test_jcc
value: [0.83333333 0.74074074 0.88 0.80769231 0.83333333 0.84
0.76923077 0.72 1. 0.79166667]
mean value: 0.8215997150997151
key: train_jcc
value: [0.90232558 0.9009434 0.90140845 0.89622642 0.91079812 0.83783784
0.90186916 0.91588785 0.88940092 0.89814815]
mean value: 0.8954845882476823
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01473141 0.01323748 0.01226425 0.01224113 0.01230812 0.01203942
0.01149964 0.0120852 0.01192832 0.01190805]
mean value: 0.012424302101135255
key: score_time
value: [0.01256609 0.0105772 0.01097941 0.01110196 0.01087761 0.01021123
0.0105443 0.01062822 0.01049519 0.01022792]
mean value: 0.010820913314819335
key: test_mcc
value: [0.77865613 0.60079051 0.60079051 0.65604724 0.70780516 0.73320158
0.42993591 0.60000118 0.59109821 0.64613475]
mean value: 0.6344461181857074
key: train_mcc
value: [0.67799996 0.63062266 0.64891459 0.66881392 0.69328869 0.66530582
0.64321841 0.71511705 0.67817152 0.66530582]
mean value: 0.6686758440784181
key: test_accuracy
value: [0.88888889 0.8 0.8 0.82222222 0.84444444 0.86666667
0.71111111 0.8 0.77777778 0.82222222]
mean value: 0.8133333333333334
key: train_accuracy
value: [0.83703704 0.81234568 0.82222222 0.83209877 0.84444444 0.82962963
0.81234568 0.85432099 0.83703704 0.82962963]
mean value: 0.8311111111111111
key: test_fscore
value: [0.88888889 0.8 0.8 0.80952381 0.82926829 0.86363636
0.66666667 0.79069767 0.72222222 0.80952381]
mean value: 0.7980427727563293
key: train_fscore
value: [0.82722513 0.79787234 0.81052632 0.82105263 0.83464567 0.81794195
0.7877095 0.84432718 0.828125 0.81794195]
mean value: 0.8187367666976243
key: test_precision
value: [0.90909091 0.81818182 0.81818182 0.89473684 0.94444444 0.86363636
0.76470588 0.80952381 0.92857143 0.85 ]
mean value: 0.8601073316088796
key: train_precision
value: [0.87777778 0.86206897 0.86516854 0.87640449 0.88826816 0.88068182
0.90967742 0.90909091 0.87845304 0.88068182]
mean value: 0.8828272936910883
key: test_recall
value: [0.86956522 0.7826087 0.7826087 0.73913043 0.73913043 0.86363636
0.59090909 0.77272727 0.59090909 0.77272727]
mean value: 0.750395256916996
key: train_recall
value: [0.78217822 0.74257426 0.76237624 0.77227723 0.78712871 0.7635468
0.69458128 0.78817734 0.78325123 0.7635468 ]
mean value: 0.7639638101741208
key: test_roc_auc
value: [0.88932806 0.80039526 0.80039526 0.82411067 0.84683794 0.86660079
0.70849802 0.79940711 0.77371542 0.82114625]
mean value: 0.8130434782608695
key: train_roc_auc
value: [0.83690192 0.81217383 0.82207482 0.83195142 0.84430327 0.8297932
0.81263718 0.85448471 0.83717017 0.8297932 ]
mean value: 0.8311283714578355
key: test_jcc
value: [0.8 0.66666667 0.66666667 0.68 0.70833333 0.76
0.5 0.65384615 0.56521739 0.68 ]
mean value: 0.6680730211817169
key: train_jcc
value: [0.70535714 0.66371681 0.68141593 0.69642857 0.71621622 0.69196429
0.64976959 0.73059361 0.70666667 0.69196429]
mean value: 0.6934093104519392
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01203966 0.01195383 0.01233935 0.01260471 0.01238465 0.01232862
0.01189423 0.01254082 0.01260471 0.0126884 ]
mean value: 0.012337899208068848
key: score_time
value: [0.01046038 0.01066375 0.01018858 0.01057577 0.01045346 0.0106113
0.01064634 0.01088023 0.01084447 0.01107097]
mean value: 0.010639524459838868
key: test_mcc
value: [0.70780516 0.4229249 0.68972332 0.73559956 0.78530224 0.69583743
0.55841694 0.64426877 0.69404997 0.55841694]
mean value: 0.6492345229004394
key: train_mcc
value: [0.7284056 0.69072841 0.70964919 0.73836061 0.7234551 0.75324391
0.75343373 0.76814813 0.70374345 0.72863208]
mean value: 0.7297800213531322
key: test_accuracy
value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444
0.77777778 0.82222222 0.84444444 0.77777778]
mean value: 0.8222222222222222
key: train_accuracy
value: [0.86419753 0.84444444 0.85432099 0.8691358 0.8617284 0.87654321
0.87654321 0.88395062 0.85185185 0.86419753]
mean value: 0.8646913580246913
key: test_fscore
value: [0.82926829 0.71111111 0.84444444 0.875 0.88372093 0.85106383
0.7826087 0.81818182 0.82926829 0.7826087 ]
mean value: 0.8207276110427367
key: train_fscore
value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189
0.875 0.88279302 0.85148515 0.86284289]
mean value: 0.8631024534573951
key: test_precision
value: [0.94444444 0.72727273 0.86363636 0.84 0.95 0.8
0.75 0.81818182 0.89473684 0.75 ]
mean value: 0.8338272195640617
key: train_precision
value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211
0.88832487 0.89393939 0.85572139 0.87373737]
mean value: 0.8729018186387163
key: test_recall
value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091
0.81818182 0.81818182 0.77272727 0.81818182]
mean value: 0.8136363636363636
key: train_recall
value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507
0.86206897 0.87192118 0.84729064 0.85221675]
mean value: 0.8538116373213676
key: test_roc_auc
value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498
0.77865613 0.82213439 0.84288538 0.77865613]
mean value: 0.8227272727272728
key: train_roc_auc
value: [0.8642028 0.84435205 0.85425304 0.86915329 0.86172755 0.87656684
0.87657904 0.88398039 0.85186314 0.86422719]
mean value: 0.864690533092718
key: test_jcc
value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074
0.64285714 0.69230769 0.70833333 0.64285714]
mean value: 0.6987367198574095
key: train_jcc
value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106
0.77777778 0.79017857 0.74137931 0.75877193]
mean value: 0.7594002164212214
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01207423 0.01183081 0.01203465 0.01215744 0.01158547 0.01139283
0.01193786 0.01136422 0.01127291 0.01138806]
mean value: 0.011703848838806152
key: score_time
value: [0.01589179 0.01923251 0.01931143 0.01723266 0.01773953 0.01866388
0.01476097 0.01722598 0.01753187 0.01763558]
mean value: 0.017522621154785156
key: test_mcc
value: [0.47603428 0.48698902 0.51185771 0.38799274 0.64752602 0.48698902
0.42178301 0.33402405 0.58158 0.60000118]
mean value: 0.49347770092446475
key: train_mcc
value: [0.69385167 0.70422287 0.66913791 0.71410816 0.68986411 0.70498382
0.70498382 0.6842722 0.68897398 0.70403264]
mean value: 0.6958431183577889
key: test_accuracy
value: [0.73333333 0.73333333 0.75555556 0.68888889 0.82222222 0.73333333
0.71111111 0.66666667 0.75555556 0.8 ]
mean value: 0.74
key: train_accuracy
value: [0.84691358 0.85185185 0.8345679 0.85679012 0.84444444 0.85185185
0.85185185 0.84197531 0.84444444 0.85185185]
mean value: 0.8476543209876544
key: test_fscore
value: [0.71428571 0.7 0.75555556 0.73076923 0.81818182 0.76
0.69767442 0.63414634 0.66666667 0.79069767]
mean value: 0.7267977419945656
key: train_fscore
value: [0.84577114 0.84848485 0.8337469 0.85353535 0.83969466 0.84771574
0.84771574 0.84 0.84367246 0.85 ]
mean value: 0.8450336829707287
key: test_precision
value: [0.78947368 0.82352941 0.77272727 0.65517241 0.85714286 0.67857143
0.71428571 0.68421053 1. 0.80952381]
mean value: 0.7784637118335207
key: train_precision
value: [0.85 0.86597938 0.8358209 0.87113402 0.86387435 0.87434555
0.87434555 0.85279188 0.85 0.86294416]
mean value: 0.8601235783219559
key: test_recall
value: [0.65217391 0.60869565 0.73913043 0.82608696 0.7826087 0.86363636
0.68181818 0.59090909 0.5 0.77272727]
mean value: 0.7017786561264823
key: train_recall
value: [0.84158416 0.83168317 0.83168317 0.83663366 0.81683168 0.8226601
0.8226601 0.82758621 0.83743842 0.83743842]
mean value: 0.8306199092815685
key: test_roc_auc
value: [0.73517787 0.73616601 0.75592885 0.68577075 0.82312253 0.73616601
0.71047431 0.66501976 0.75 0.79940711]
mean value: 0.7397233201581027
key: train_roc_auc
value: [0.84690045 0.85180218 0.8345608 0.85674048 0.84437643 0.85192411
0.85192411 0.84201093 0.84446179 0.85188753]
mean value: 0.8476588791884114
key: test_jcc
value: [0.55555556 0.53846154 0.60714286 0.57575758 0.69230769 0.61290323
0.53571429 0.46428571 0.5 0.65384615]
mean value: 0.5735974598877824
key: train_jcc
value: [0.73275862 0.73684211 0.71489362 0.74449339 0.72368421 0.73568282
0.73568282 0.72413793 0.72961373 0.73913043]
mean value: 0.731691968406008
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02174282 0.02407169 0.01865649 0.01932096 0.01913285 0.01908374
0.01905656 0.01899648 0.01942968 0.02316642]
mean value: 0.020265769958496094
key: score_time
value: [0.01160741 0.01241922 0.01122999 0.01308322 0.01144195 0.01153398
0.01152825 0.01177454 0.01203108 0.01304173]
mean value: 0.01196913719177246
key: test_mcc
value: [0.73663511 0.68972332 0.86732843 0.77821935 0.86758893 0.82574419
0.77865613 0.68911026 0.95652174 0.73320158]
mean value: 0.7922729043083154
key: train_mcc
value: [0.79762457 0.81737922 0.80251189 0.80246793 0.80766419 0.81237958
0.80246793 0.81736586 0.79262493 0.80741373]
mean value: 0.8059899831841102
key: test_accuracy
value: [0.86666667 0.84444444 0.93333333 0.88888889 0.93333333 0.91111111
0.88888889 0.84444444 0.97777778 0.86666667]
mean value: 0.8955555555555555
key: train_accuracy
value: [0.89876543 0.90864198 0.90123457 0.90123457 0.9037037 0.90617284
0.90123457 0.90864198 0.8962963 0.9037037 ]
mean value: 0.902962962962963
key: test_fscore
value: [0.86363636 0.84444444 0.93617021 0.89361702 0.93333333 0.91304348
0.88888889 0.8372093 0.97777778 0.86363636]
mean value: 0.8951757186346175
key: train_fscore
value: [0.8992629 0.90909091 0.90147783 0.9009901 0.90464548 0.90686275
0.90147783 0.90953545 0.89705882 0.9041769 ]
mean value: 0.903457897428805
key: test_precision
value: [0.9047619 0.86363636 0.91666667 0.875 0.95454545 0.875
0.86956522 0.85714286 0.95652174 0.86363636]
mean value: 0.893647656691135
key: train_precision
value: [0.89268293 0.90243902 0.89705882 0.9009901 0.89371981 0.90243902
0.90147783 0.90291262 0.89268293 0.90196078]
mean value: 0.8988363869926886
key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
0.90909091 0.81818182 1. 0.86363636]
mean value: 0.8980237154150198
key: train_recall
value: [0.90594059 0.91584158 0.90594059 0.9009901 0.91584158 0.91133005
0.90147783 0.91625616 0.90147783 0.90640394]
mean value: 0.9081500268253426
key: test_roc_auc
value: [0.86758893 0.84486166 0.93280632 0.88833992 0.93379447 0.91205534
0.88932806 0.84387352 0.97826087 0.86660079]
mean value: 0.8957509881422925
key: train_roc_auc
value: [0.8987831 0.90865971 0.90124616 0.90123397 0.9037336 0.90616007
0.90123397 0.90862313 0.89628347 0.90369702]
mean value: 0.9029654196946788
key: test_jcc
value: [0.76 0.73076923 0.88 0.80769231 0.875 0.84
0.8 0.72 0.95652174 0.76 ]
mean value: 0.8129983277591973
key: train_jcc
value: [0.81696429 0.83333333 0.8206278 0.81981982 0.82589286 0.82959641
0.8206278 0.83408072 0.81333333 0.82511211]
mean value: 0.8239388472392957
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.31762528 2.59691238 1.63772154 1.77554345 1.59011173 1.17077398
2.11147809 1.44927478 0.70069718 1.59665656]
mean value: 1.5946794986724853
key: score_time
value: [0.01862407 0.02381897 0.01357055 0.01248121 0.02450323 0.02156973
0.01986384 0.02018833 0.01420259 0.01296401]
mean value: 0.018178653717041016
key: test_mcc
value: [0.73663511 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
0.77821935 0.73559956 1. 0.69404997]
mean value: 0.7927653674534292
key: train_mcc
value: [0.83247548 0.8716498 0.82799641 0.8520244 0.83086317 0.79284035
0.83012449 0.83313446 0.78773172 0.81956701]
mean value: 0.8278407283052743
key: test_accuracy
value: [0.86666667 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
0.88888889 0.86666667 1. 0.84444444]
mean value: 0.8955555555555555
key: train_accuracy
value: [0.91604938 0.93580247 0.91358025 0.92592593 0.91358025 0.89382716
0.91358025 0.91604938 0.89382716 0.90864198]
mean value: 0.9130864197530864
key: test_fscore
value: [0.86363636 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
0.88372093 0.85714286 1. 0.82926829]
mean value: 0.89360194458532
key: train_fscore
value: [0.91707317 0.93596059 0.91525424 0.92647059 0.91725768 0.88772846
0.91002571 0.91414141 0.89486553 0.90537084]
mean value: 0.9124148220877728
key: test_precision
value: [0.9047619 0.82608696 0.91666667 0.875 0.95454545 0.875
0.9047619 0.9 1. 0.89473684]
mean value: 0.9051559729362934
key: train_precision
value: [0.90384615 0.93137255 0.8957346 0.91747573 0.87782805 0.94444444
0.9516129 0.93782383 0.88834951 0.94148936]
mean value: 0.9189977140608518
key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
0.86363636 0.81818182 1. 0.77272727]
mean value: 0.8843873517786561
key: train_recall
value: [0.93069307 0.94059406 0.93564356 0.93564356 0.96039604 0.83743842
0.87192118 0.89162562 0.90147783 0.87192118]
mean value: 0.9077354533482905
key: test_roc_auc
value: [0.86758893 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
0.88833992 0.86561265 1. 0.84288538]
mean value: 0.8953557312252964
key: train_roc_auc
value: [0.91608545 0.93581427 0.91363459 0.92594986 0.91369556 0.89396674
0.91368336 0.91610984 0.89380822 0.90873287]
mean value: 0.913148075891333
key: test_jcc
value: [0.76 0.7037037 0.88 0.80769231 0.875 0.84
0.79166667 0.75 1. 0.70833333]
mean value: 0.8116396011396011
key: train_jcc
value: [0.84684685 0.87962963 0.84375 0.8630137 0.84716157 0.79812207
0.83490566 0.84186047 0.80973451 0.8271028 ]
mean value: 0.8392127255393006
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03762817 0.0222249 0.02336216 0.02422976 0.02328444 0.02246857
0.02320361 0.02444267 0.02557874 0.02451849]
mean value: 0.025094151496887207
key: score_time
value: [0.01054001 0.01044989 0.0104568 0.01041985 0.01052427 0.01045942
0.01032162 0.0105691 0.01049876 0.01033282]
mean value: 0.010457253456115723
key: test_mcc
value: [0.77865613 0.91106719 0.82506438 0.91106719 0.82213439 0.91485328
0.95652174 0.77821935 0.95643752 0.95643752]
mean value: 0.8810458680630867
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 0.95555556 0.91111111 0.95555556 0.91111111 0.95555556
0.97777778 0.88888889 0.97777778 0.97777778]
mean value: 0.94
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.95652174 0.91666667 0.95652174 0.91304348 0.95652174
0.97777778 0.88372093 0.97674419 0.97674419]
mean value: 0.9403151331311089
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.95652174 0.88 0.95652174 0.91304348 0.91666667
0.95652174 0.9047619 1. 1. ]
mean value: 0.9393128176171655
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86956522 0.95652174 0.95652174 0.95652174 0.91304348 1.
1. 0.86363636 0.95454545 0.95454545]
mean value: 0.9424901185770751
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88932806 0.9555336 0.91007905 0.9555336 0.91106719 0.95652174
0.97826087 0.88833992 0.97727273 0.97727273]
mean value: 0.9399209486166008
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.91666667 0.84615385 0.91666667 0.84 0.91666667
0.95652174 0.79166667 0.95454545 0.95454545]
mean value: 0.8893433161041857
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.14324355 0.14098144 0.14159226 0.14177227 0.14036822 0.14066744
0.13828254 0.1379981 0.14093471 0.14011884]
mean value: 0.14059593677520751
key: score_time
value: [0.02067876 0.02081728 0.02086139 0.02107191 0.02067494 0.0208261
0.01972151 0.02075052 0.02083349 0.02075028]
mean value: 0.02069861888885498
key: test_mcc
value: [0.82574419 0.73320158 0.86758893 0.68911026 0.82574419 0.78530224
0.78530224 0.64426877 0.86732843 0.82213439]
mean value: 0.784572523307409
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.86666667 0.93333333 0.84444444 0.91111111 0.88888889
0.88888889 0.82222222 0.93333333 0.91111111]
mean value: 0.8911111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.86956522 0.93333333 0.85106383 0.90909091 0.89361702
0.89361702 0.81818182 0.93023256 0.90909091]
mean value: 0.8916883526659143
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 0.86956522 0.95454545 0.83333333 0.95238095 0.84
0.84 0.81818182 0.95238095 0.90909091]
mean value: 0.8921859589685677
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86956522 0.86956522 0.91304348 0.86956522 0.86956522 0.95454545
0.95454545 0.81818182 0.90909091 0.90909091]
mean value: 0.8936758893280632
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91205534 0.86660079 0.93379447 0.84387352 0.91205534 0.89031621
0.89031621 0.82213439 0.93280632 0.91106719]
mean value: 0.8915019762845849
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.76923077 0.875 0.74074074 0.83333333 0.80769231
0.80769231 0.69230769 0.86956522 0.83333333]
mean value: 0.8062229035055122
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01231742 0.01216841 0.0121696 0.01225901 0.01252937 0.01230884
0.01246262 0.01228213 0.01244259 0.01268888]
mean value: 0.01236288547515869
key: score_time
value: [0.0103507 0.01036882 0.01041508 0.0105629 0.0106926 0.01041341
0.01040411 0.0103581 0.01045394 0.01082134]
mean value: 0.010484099388122559
key: test_mcc
value: [0.43557241 0.68972332 0.73559956 0.51185771 0.43557241 0.38019877
0.2903816 0.73663511 0.33824342 0.670374 ]
mean value: 0.5224158297692548
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71111111 0.84444444 0.86666667 0.75555556 0.71111111 0.68888889
0.64444444 0.86666667 0.66666667 0.82222222]
mean value: 0.7577777777777778
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.68292683 0.84444444 0.875 0.75555556 0.68292683 0.69565217
0.6 0.86956522 0.61538462 0.84 ]
mean value: 0.7461455665225548
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.86363636 0.84 0.77272727 0.77777778 0.66666667
0.66666667 0.83333333 0.70588235 0.75 ]
mean value: 0.7654468211527035
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.60869565 0.82608696 0.91304348 0.73913043 0.60869565 0.72727273
0.54545455 0.90909091 0.54545455 0.95454545]
mean value: 0.7377470355731225
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71343874 0.84486166 0.86561265 0.75592885 0.71343874 0.68972332
0.64229249 0.86758893 0.66403162 0.82509881]
mean value: 0.758201581027668
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.51851852 0.73076923 0.77777778 0.60714286 0.51851852 0.53333333
0.42857143 0.76923077 0.44444444 0.72413793]
mean value: 0.6052444809341361
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.55
Accuracy on Blind test: 0.77
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.9608767 1.95501304 2.26225162 1.84889507 2.05728984 1.85792685
4.34983826 2.38245821 2.61548281 2.64022255]
mean value: 2.3930254936218263
key: score_time
value: [0.10743237 0.22150254 0.09776163 0.13473463 0.10169816 0.17076182
0.25548291 0.19140983 0.15087056 0.12873602]
mean value: 0.15603904724121093
key: test_mcc
value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174
0.86758893 0.77821935 1. 0.95643752]
mean value: 0.8986933809832185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778
0.93333333 0.88888889 1. 0.97777778]
mean value: 0.9488888888888889
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.95652174 0.93617021 0.9787234 0.90909091 0.97777778
0.93333333 0.88372093 1. 0.97674419]
mean value: 0.9485415825966135
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174
0.91304348 0.9047619 1. 1. ]
mean value: 0.9512775268210051
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 0.95652174 1. 0.86956522 1.
0.95454545 0.86363636 1. 0.95454545]
mean value: 0.9468379446640316
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93379447 0.9555336 0.93280632 0.97727273 0.91205534 0.97826087
0.93379447 0.88833992 1. 0.97727273]
mean value: 0.9489130434782609
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.91666667 0.88 0.95833333 0.83333333 0.95652174
0.875 0.79166667 1. 0.95454545]
mean value: 0.9041067193675889
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.02290845 1.19409299 1.70667768 1.77199078 1.73614359 2.20723319
1.984725 1.91829991 1.85758781 1.00323558]
mean value: 1.6402894973754882
key: score_time
value: [0.15309381 0.17785645 0.17673326 0.22488499 0.2151401 0.18284273
0.1778214 0.29034138 0.14019394 0.16655993]
mean value: 0.19054679870605468
key: test_mcc
value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.91106719
0.86758893 0.77821935 1. 0.87406293]
mean value: 0.8770171874100532
key: train_mcc
value: [0.95556748 0.95061698 0.94568955 0.94078482 0.95556639 0.95066455
0.95556748 0.97532008 0.94578446 0.93590713]
mean value: 0.951146893201595
key: test_accuracy
value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.95555556
0.93333333 0.88888889 1. 0.93333333]
mean value: 0.9377777777777778
key: train_accuracy
value: [0.97777778 0.97530864 0.97283951 0.97037037 0.97777778 0.97530864
0.97777778 0.98765432 0.97283951 0.96790123]
mean value: 0.9755555555555555
key: test_fscore
value: [0.93333333 0.91304348 0.93617021 0.9787234 0.90909091 0.95454545
0.93333333 0.88372093 1. 0.92682927]
mean value: 0.9368790324110416
key: train_fscore
value: [0.97777778 0.97524752 0.97270471 0.97014925 0.97766749 0.97524752
0.97777778 0.98771499 0.97270471 0.96774194]
mean value: 0.9754733705067631
key: test_precision
value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95454545
0.91304348 0.9047619 1. 1. ]
mean value: 0.9467320722755506
key: train_precision
value: [0.97536946 0.97524752 0.97512438 0.975 0.9800995 0.9800995
0.98019802 0.98529412 0.98 0.975 ]
mean value: 0.978143250341417
key: test_recall
value: [0.91304348 0.91304348 0.95652174 1. 0.86956522 0.95454545
0.95454545 0.86363636 1. 0.86363636]
mean value: 0.9288537549407114
key: train_recall
value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335
0.97536946 0.99014778 0.96551724 0.96059113]
mean value: 0.9728405599180607
key: test_roc_auc
value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.9555336
0.93379447 0.88833992 1. 0.93181818]
mean value: 0.9376482213438735
key: train_roc_auc
value: [0.97778374 0.97530849 0.97283324 0.970358 0.97777155 0.97532068
0.97778374 0.98764815 0.97285763 0.96791933]
mean value: 0.9755584548602644
key: test_jcc
value: [0.875 0.84 0.88 0.95833333 0.83333333 0.91304348
0.875 0.79166667 1. 0.86363636]
mean value: 0.8830013175230567
key: train_jcc
value: [0.95652174 0.95169082 0.9468599 0.94202899 0.95631068 0.95169082
0.95652174 0.97572816 0.9468599 0.9375 ]
mean value: 0.9521712747994935
MCC on Blind test: 0.93
Accuracy on Blind test: 0.96
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02517962 0.0100472 0.01011395 0.01026773 0.00996494 0.01005149
0.01018238 0.00989962 0.00991511 0.00988817]
mean value: 0.01155102252960205
key: score_time
value: [0.00965428 0.00895143 0.00905299 0.00892019 0.0089519 0.00888443
0.00896573 0.0087378 0.00882101 0.00882554]
mean value: 0.00897653102874756
key: test_mcc
value: [0.70780516 0.4229249 0.68972332 0.73559956 0.78530224 0.69583743
0.55841694 0.64426877 0.69404997 0.55841694]
mean value: 0.6492345229004394
key: train_mcc
value: [0.7284056 0.69072841 0.70964919 0.73836061 0.7234551 0.75324391
0.75343373 0.76814813 0.70374345 0.72863208]
mean value: 0.7297800213531322
key: test_accuracy
value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444
0.77777778 0.82222222 0.84444444 0.77777778]
mean value: 0.8222222222222222
key: train_accuracy
value: [0.86419753 0.84444444 0.85432099 0.8691358 0.8617284 0.87654321
0.87654321 0.88395062 0.85185185 0.86419753]
mean value: 0.8646913580246913
key: test_fscore
value: [0.82926829 0.71111111 0.84444444 0.875 0.88372093 0.85106383
0.7826087 0.81818182 0.82926829 0.7826087 ]
mean value: 0.8207276110427367
key: train_fscore
value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189
0.875 0.88279302 0.85148515 0.86284289]
mean value: 0.8631024534573951
key: test_precision
value: [0.94444444 0.72727273 0.86363636 0.84 0.95 0.8
0.75 0.81818182 0.89473684 0.75 ]
mean value: 0.8338272195640617
key: train_precision
value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211
0.88832487 0.89393939 0.85572139 0.87373737]
mean value: 0.8729018186387163
key: test_recall
value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091
0.81818182 0.81818182 0.77272727 0.81818182]
mean value: 0.8136363636363636
key: train_recall
value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507
0.86206897 0.87192118 0.84729064 0.85221675]
mean value: 0.8538116373213676
key: test_roc_auc
value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498
0.77865613 0.82213439 0.84288538 0.77865613]
mean value: 0.8227272727272728
key: train_roc_auc
value: [0.8642028 0.84435205 0.85425304 0.86915329 0.86172755 0.87656684
0.87657904 0.88398039 0.85186314 0.86422719]
mean value: 0.864690533092718
key: test_jcc
value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074
0.64285714 0.69230769 0.70833333 0.64285714]
mean value: 0.6987367198574095
key: train_jcc
value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106
0.77777778 0.79017857 0.74137931 0.75877193]
mean value: 0.7594002164212214
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.47456765 1.54497075 1.56331825 1.58203959 1.53839636 1.51352477
1.57174468 1.49755764 1.60622501 1.62361526]
mean value: 1.5515959978103637
key: score_time
value: [0.01256537 0.0133667 0.01274014 0.01221132 0.01370311 0.01287436
0.01300812 0.01307845 0.01412868 0.01365328]
mean value: 0.013132953643798828
key: test_mcc
value: [0.82213439 0.91106719 0.95643752 1. 0.86758893 0.91485328
0.95652174 0.77821935 1. 0.95643752]
mean value: 0.9163259916823262
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 1. 0.93333333 0.95555556
0.97777778 0.88888889 1. 0.97777778]
mean value: 0.9577777777777777
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91304348 0.95652174 0.9787234 1. 0.93333333 0.95652174
0.97777778 0.88372093 1. 0.97674419]
mean value: 0.9576386588167239
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.95652174 0.95833333 1. 0.95454545 0.91666667
0.95652174 0.9047619 1. 1. ]
mean value: 0.9560394315829098
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 1. 1. 0.91304348 1.
1. 0.86363636 1. 0.95454545]
mean value: 0.9600790513833992
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91106719 0.9555336 0.97727273 1. 0.93379447 0.95652174
0.97826087 0.88833992 1. 0.97727273]
mean value: 0.957806324110672
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84 0.91666667 0.95833333 1. 0.875 0.91666667
0.95652174 0.79166667 1. 0.95454545]
mean value: 0.9209400527009223
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.12674332 0.10389161 0.06731343 0.08088517 0.083004 0.08519578
0.10193396 0.09383464 0.2874248 0.05935693]
mean value: 0.10895836353302002
key: score_time
value: [0.02858162 0.0240407 0.0236733 0.03531265 0.02388859 0.0231142
0.0168426 0.01304436 0.01560354 0.01269364]
mean value: 0.02167952060699463
key: test_mcc
value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.69583743
0.73663511 0.69404997 0.82213439 0.73559956]
mean value: 0.7584214113139539
key: train_mcc
value: [0.91606106 0.91129269 0.91605902 0.92103017 0.93581427 0.92117074
0.91605902 0.93126766 0.92602981 0.89139819]
mean value: 0.9186182632580657
key: test_accuracy
value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.84444444
0.86666667 0.84444444 0.91111111 0.86666667]
mean value: 0.8777777777777778
key: train_accuracy
value: [0.95802469 0.95555556 0.95802469 0.96049383 0.96790123 0.96049383
0.95802469 0.9654321 0.96296296 0.94567901]
mean value: 0.9592592592592593
key: test_fscore
value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.85106383
0.86956522 0.82926829 0.90909091 0.85714286]
mean value: 0.8789837110236018
key: train_fscore
value: [0.95802469 0.95588235 0.95781638 0.960199 0.96790123 0.960199
0.95823096 0.96601942 0.96277916 0.94607843]
mean value: 0.9593130629395346
key: test_precision
value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.8
0.83333333 0.89473684 0.90909091 0.9 ]
mean value: 0.8733178110981314
key: train_precision
value: [0.95566502 0.94660194 0.960199 0.965 0.96551724 0.96984925
0.95588235 0.95215311 0.97 0.94146341]
mean value: 0.9582331336586875
key: test_recall
value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.90909091
0.90909091 0.77272727 0.90909091 0.81818182]
mean value: 0.8883399209486166
key: train_recall
value: [0.96039604 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892
0.96059113 0.98029557 0.95566502 0.95073892]
mean value: 0.9604960249719553
key: test_roc_auc
value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.8458498
0.86758893 0.84288538 0.91106719 0.86561265]
mean value: 0.8774703557312253
key: train_roc_auc
value: [0.95803053 0.95557967 0.95801834 0.96048139 0.96790714 0.96051797
0.95801834 0.96539531 0.96298103 0.94566649]
mean value: 0.9592596205433351
key: test_jcc
value: [0.875 0.75 0.84 0.75 0.84 0.74074074
0.76923077 0.70833333 0.83333333 0.75 ]
mean value: 0.7856638176638177
key: train_jcc
value: [0.91943128 0.91549296 0.91904762 0.92344498 0.93779904 0.92344498
0.91981132 0.9342723 0.92822967 0.89767442]
mean value: 0.9218648556530884
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02621555 0.01159739 0.01051664 0.01107073 0.01101708 0.01104641
0.01100183 0.01075768 0.01108098 0.01129007]
mean value: 0.01255943775177002
key: score_time
value: [0.02028179 0.01012492 0.00889969 0.00954199 0.00957394 0.0095036
0.00955319 0.0093534 0.00974369 0.0098424 ]
mean value: 0.010641860961914062
key: test_mcc
value: [0.74605372 0.51089209 0.82213439 0.82213439 0.78530224 0.82574419
0.55841694 0.64752602 0.83484711 0.64426877]
mean value: 0.7197319857368281
key: train_mcc
value: [0.72859901 0.7001606 0.6847458 0.74821952 0.72358281 0.75811526
0.72914356 0.77300001 0.73836061 0.73425986]
mean value: 0.7318187035926328
key: test_accuracy
value: [0.86666667 0.75555556 0.91111111 0.91111111 0.88888889 0.91111111
0.77777778 0.82222222 0.91111111 0.82222222]
mean value: 0.8577777777777778
key: train_accuracy
value: [0.86419753 0.84938272 0.84197531 0.87407407 0.8617284 0.87901235
0.86419753 0.88641975 0.8691358 0.86666667]
mean value: 0.865679012345679
key: test_fscore
value: [0.85714286 0.76595745 0.91304348 0.91304348 0.88372093 0.91304348
0.7826087 0.82608696 0.9 0.81818182]
mean value: 0.8572829139322266
key: train_fscore
value: [0.86215539 0.84398977 0.83756345 0.87281796 0.86 0.87841191
0.86146096 0.88557214 0.86848635 0.86363636]
mean value: 0.8634094288327001
key: test_precision
value: [0.94736842 0.75 0.91304348 0.91304348 0.95 0.875
0.75 0.79166667 1. 0.81818182]
mean value: 0.8708303862422855
key: train_precision
value: [0.87309645 0.87301587 0.859375 0.87939698 0.86868687 0.885
0.8814433 0.89447236 0.875 0.88601036]
mean value: 0.877549719680029
key: test_recall
value: [0.7826087 0.7826087 0.91304348 0.91304348 0.82608696 0.95454545
0.81818182 0.86363636 0.81818182 0.81818182]
mean value: 0.8490118577075099
key: train_recall
value: [0.85148515 0.81683168 0.81683168 0.86633663 0.85148515 0.87192118
0.84236453 0.87684729 0.86206897 0.84236453]
mean value: 0.8498536799492757
key: test_roc_auc
value: [0.86857708 0.75494071 0.91106719 0.91106719 0.89031621 0.91205534
0.77865613 0.82312253 0.90909091 0.82213439]
mean value: 0.858102766798419
key: train_roc_auc
value: [0.86416622 0.84930254 0.84191338 0.87405502 0.86170317 0.8790299
0.86425157 0.88644345 0.86915329 0.86672682]
mean value: 0.8656745354338389
key: test_jcc
value: [0.75 0.62068966 0.84 0.84 0.79166667 0.84
0.64285714 0.7037037 0.81818182 0.69230769]
mean value: 0.7539406678889438
key: train_jcc
value: [0.75770925 0.7300885 0.72052402 0.77433628 0.75438596 0.78318584
0.75663717 0.79464286 0.76754386 0.76 ]
mean value: 0.7599053737883451
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0166471 0.02165747 0.01920605 0.01865745 0.01901174 0.02208638
0.01923394 0.02118444 0.03245139 0.02038074]
mean value: 0.021051669120788576
key: score_time
value: [0.01027346 0.01216531 0.0124557 0.01228547 0.01229858 0.01237655
0.01228023 0.01235175 0.02047372 0.01226282]
mean value: 0.012922358512878419
key: test_mcc
value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645
0.70780516 0.64752602 0.82213439 0.70501339]
mean value: 0.7297153566397614
key: train_mcc
value: [0.86377146 0.84895551 0.81816266 0.86902982 0.80684222 0.81282858
0.86843671 0.81827627 0.88164702 0.87785481]
mean value: 0.8465805044965292
key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889
0.84444444 0.82222222 0.91111111 0.84444444]
mean value: 0.86
key: train_accuracy
value: [0.9308642 0.92098765 0.90617284 0.93333333 0.8962963 0.89876543
0.93333333 0.9037037 0.94074074 0.9382716 ]
mean value: 0.9202469135802469
key: test_fscore
value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878
0.85714286 0.82608696 0.90909091 0.82051282]
mean value: 0.8535362607590927
key: train_fscore
value: [0.92820513 0.91534392 0.91121495 0.93059126 0.8852459 0.88828338
0.93556086 0.91116173 0.94146341 0.93670886]
mean value: 0.9183779402635586
key: test_precision
value: [0.95 0.85714286 0.91666667 0.86956522 0.93333333 0.94736842
0.77777778 0.79166667 0.90909091 0.94117647]
mean value: 0.8893788319710382
key: train_precision
value: [0.96276596 0.98295455 0.86283186 0.96791444 0.98780488 0.99390244
0.90740741 0.84745763 0.93236715 0.96354167]
mean value: 0.940894796783545
key: test_recall
value: [0.82608696 0.7826087 0.95652174 0.86956522 0.60869565 0.81818182
0.95454545 0.86363636 0.90909091 0.72727273]
mean value: 0.8316205533596838
key: train_recall
value: [0.8960396 0.85643564 0.96534653 0.8960396 0.8019802 0.80295567
0.96551724 0.98522167 0.95073892 0.91133005]
mean value: 0.9031605130956446
key: test_roc_auc
value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178
0.84683794 0.82312253 0.91106719 0.84189723]
mean value: 0.8604743083003953
key: train_roc_auc
value: [0.93077842 0.92082866 0.90631859 0.93324148 0.89606399 0.89900258
0.93325367 0.90350193 0.94071599 0.93833829]
mean value: 0.9202043603375115
key: test_jcc
value: [0.79166667 0.69230769 0.88 0.76923077 0.58333333 0.7826087
0.75 0.7037037 0.83333333 0.69565217]
mean value: 0.7481836368140716
key: train_jcc
value: [0.86602871 0.84390244 0.83690987 0.87019231 0.79411765 0.79901961
0.87892377 0.83682008 0.88940092 0.88095238]
mean value: 0.8496267734106784
MCC on Blind test: 0.75
Accuracy on Blind test: 0.87
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03038692 0.02198148 0.02177858 0.01954556 0.02221322 0.02213001
0.02184844 0.02069044 0.01776695 0.01800418]
mean value: 0.021634578704833984
key: score_time
value: [0.01426578 0.01328993 0.01486778 0.01319575 0.01274991 0.01249719
0.01248622 0.01217604 0.01213098 0.01217985]
mean value: 0.012983942031860351
key: test_mcc
value: [0.78530224 0.69404997 0.82213439 0.78405645 0.78530224 0.82213439
0.69583743 0.64752602 0.87406293 0.62869461]
mean value: 0.7539100674057231
key: train_mcc
value: [0.82016416 0.89684043 0.91614635 0.88695876 0.90644294 0.89949116
0.9023231 0.81395079 0.84022048 0.79853924]
mean value: 0.868107740114024
key: test_accuracy
value: [0.88888889 0.84444444 0.91111111 0.88888889 0.88888889 0.91111111
0.84444444 0.82222222 0.93333333 0.8 ]
mean value: 0.8733333333333333
key: train_accuracy
value: [0.90617284 0.94814815 0.95802469 0.94320988 0.95308642 0.94814815
0.95061728 0.90123457 0.91604938 0.89135802]
mean value: 0.931604938271605
key: test_fscore
value: [0.88372093 0.85714286 0.91304348 0.89795918 0.88372093 0.90909091
0.85106383 0.82608696 0.92682927 0.75675676]
mean value: 0.8705415099991635
key: train_fscore
value: [0.89893617 0.94890511 0.95760599 0.94403893 0.95238095 0.94601542
0.95192308 0.90909091 0.91005291 0.87978142]
mean value: 0.9298730887557013
key: test_precision
value: [0.95 0.80769231 0.91304348 0.84615385 0.95 0.90909091
0.8 0.79166667 1. 0.93333333]
mean value: 0.8900980541197933
key: train_precision
value: [0.97126437 0.93301435 0.96482412 0.92822967 0.96446701 0.98924731
0.92957746 0.84388186 0.98285714 0.98773006]
mean value: 0.9495093349997615
key: test_recall
value: [0.82608696 0.91304348 0.91304348 0.95652174 0.82608696 0.90909091
0.90909091 0.86363636 0.86363636 0.63636364]
mean value: 0.8616600790513834
key: train_recall
value: [0.83663366 0.96534653 0.95049505 0.96039604 0.94059406 0.90640394
0.97536946 0.98522167 0.84729064 0.79310345]
mean value: 0.916085450909623
key: test_roc_auc
value: [0.89031621 0.84288538 0.91106719 0.88735178 0.89031621 0.91106719
0.8458498 0.82312253 0.93181818 0.79644269]
mean value: 0.8730237154150198
key: train_roc_auc
value: [0.90600156 0.94819051 0.95800615 0.94325221 0.95305565 0.94825148
0.95055602 0.90102668 0.91621958 0.89160123]
mean value: 0.9316161049602497
key: test_jcc
value: [0.79166667 0.75 0.84 0.81481481 0.79166667 0.83333333
0.74074074 0.7037037 0.86363636 0.60869565]
mean value: 0.7738257941736203
key: train_jcc
value: [0.81642512 0.90277778 0.91866029 0.89400922 0.90909091 0.89756098
0.90825688 0.83333333 0.83495146 0.78536585]
mean value: 0.8700431810959086
MCC on Blind test: 0.84
Accuracy on Blind test: 0.92
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.1846261 0.16588259 0.17944908 0.16845226 0.20947981 0.17672706
0.16525626 0.16993427 0.16802812 0.17575526]
mean value: 0.17635908126831054
key: score_time
value: [0.0156827 0.01687837 0.01521039 0.01514649 0.02424717 0.01539278
0.01648426 0.01540637 0.01509404 0.02262664]
mean value: 0.017216920852661133
key: test_mcc
value: [0.82213439 0.86732843 0.95643752 1. 0.86758893 0.91485328
0.91106719 0.73559956 1. 1. ]
mean value: 0.9075009309091597
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.93333333 0.97777778 1. 0.93333333 0.95555556
0.95555556 0.86666667 1. 1. ]
mean value: 0.9533333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91304348 0.93617021 0.9787234 1. 0.93333333 0.95652174
0.95454545 0.85714286 1. 1. ]
mean value: 0.9529480479434226
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.91666667 0.95833333 1. 0.95454545 0.91666667
0.95454545 0.9 1. 1. ]
mean value: 0.9513801054018445
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 1. 1. 0.91304348 1.
0.95454545 0.81818182 1. 1. ]
mean value: 0.9555335968379447
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91106719 0.93280632 0.97727273 1. 0.93379447 0.95652174
0.9555336 0.86561265 1. 1. ]
mean value: 0.9532608695652174
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84 0.88 0.95833333 1. 0.875 0.91666667
0.91304348 0.75 1. 1. ]
mean value: 0.9133043478260869
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.93
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05411959 0.06604648 0.08949947 0.05077219 0.0609808 0.0644958
0.07106686 0.05174732 0.07121181 0.07347536]
mean value: 0.06534156799316407
key: score_time
value: [0.02211094 0.03330112 0.0351975 0.02889776 0.02099347 0.02941012
0.02830839 0.02586412 0.02633166 0.02234125]
mean value: 0.027275633811950684
key: test_mcc
value: [0.82213439 0.91106719 0.91106719 1. 0.86758893 0.91485328
0.91485328 0.82213439 1. 0.87406293]
mean value: 0.9037761587267465
key: train_mcc
value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413
0.98024679 0.99507389 0.98029509 0.98519729]
mean value: 0.9837523754632594
key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 1. 0.93333333 0.95555556
0.95555556 0.91111111 1. 0.93333333]
mean value: 0.9511111111111111
key: train_accuracy
value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346
0.99012346 0.99753086 0.99012346 0.99259259]
mean value: 0.9918518518518519
key: test_fscore
value: [0.91304348 0.95652174 0.95652174 1. 0.93333333 0.95652174
0.95652174 0.90909091 1. 0.92682927]
mean value: 0.9508383945499534
key: train_fscore
value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608
0.99014778 0.99753086 0.99009901 0.99259259]
mean value: 0.9918270548106813
key: test_precision
value: [0.91304348 0.95652174 0.95652174 1. 0.95454545 0.91666667
0.91666667 0.90909091 1. 1. ]
mean value: 0.9523056653491436
key: train_precision
value: [0.99009901 0.99502488 0.99502488 1. 1. 0.98536585
0.99014778 1. 0.99502488 0.9950495 ]
mean value: 0.9945736778626925
key: test_recall
value: [0.91304348 0.95652174 0.95652174 1. 0.91304348 1.
1. 0.90909091 1. 0.86363636]
mean value: 0.9511857707509881
key: train_recall
value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389
0.99014778 0.99507389 0.98522167 0.99014778]
mean value: 0.9891308588986978
key: test_roc_auc
value: [0.91106719 0.9555336 0.9555336 1. 0.93379447 0.95652174
0.95652174 0.91106719 1. 0.93181818]
mean value: 0.9511857707509882
key: train_roc_auc
value: [0.9901234 0.99258645 0.99258645 0.9950495 0.98762376 0.9901112
0.9901234 0.99753695 0.99013559 0.99259864]
mean value: 0.9918475345071454
key: test_jcc
value: [0.84 0.91666667 0.91666667 1. 0.875 0.91666667
0.91666667 0.83333333 1. 0.86363636]
mean value: 0.9078636363636363
key: train_jcc
value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252
0.9804878 0.99507389 0.98039216 0.98529412]
mean value: 0.9838012536555218
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.10892105 0.16002131 0.14787793 0.20454001 0.19232368 0.16132021
0.145684 0.18226552 0.45024633 0.14779735]
mean value: 0.19009974002838134
key: score_time
value: [0.01462197 0.01497269 0.02363658 0.03207684 0.01924872 0.03103209
0.02994967 0.02701902 0.04569006 0.01469874]
mean value: 0.02529463768005371
key: test_mcc
value: [0.670374 0.55841694 0.63358389 0.6133209 0.73663511 0.69156407
0.4229249 0.55533597 0.72299881 0.64613475]
mean value: 0.625128933432823
key: train_mcc
value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376
0.99017193 0.99017193 0.99507389 0.99017193]
mean value: 0.9887104432367875
key: test_accuracy
value: [0.82222222 0.77777778 0.8 0.8 0.86666667 0.82222222
0.71111111 0.77777778 0.84444444 0.82222222]
mean value: 0.8044444444444444
key: train_accuracy
value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259
0.99506173 0.99506173 0.99753086 0.99506173]
mean value: 0.994320987654321
key: test_fscore
value: [0.8 0.77272727 0.76923077 0.82352941 0.86363636 0.84615385
0.71111111 0.77272727 0.81081081 0.80952381]
mean value: 0.7979450667685961
key: train_fscore
value: [0.99502488 0.99502488 0.9925187 0.9925187 0.9925187 0.99255583
0.9950495 0.9950495 0.99753086 0.9950495 ]
mean value: 0.9942841071283992
key: test_precision
value: [0.94117647 0.80952381 0.9375 0.75 0.9047619 0.73333333
0.69565217 0.77272727 1. 0.85 ]
mean value: 0.8394674964847599
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1.
0.72727273 0.77272727 0.68181818 0.77272727]
mean value: 0.7780632411067193
key: train_recall
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
0.99014778 0.99014778 0.99507389 0.99014778]
mean value: 0.9886382480612593
key: test_roc_auc
value: [0.82509881 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696
0.71146245 0.77766798 0.84090909 0.82114625]
mean value: 0.8049407114624506
key: train_roc_auc
value: [0.9950495 0.9950495 0.99257426 0.99257426 0.99257426 0.99261084
0.99507389 0.99507389 0.99753695 0.99507389]
mean value: 0.9943191240306297
key: test_jcc
value: [0.66666667 0.62962963 0.625 0.7 0.76 0.73333333
0.55172414 0.62962963 0.68181818 0.68 ]
mean value: 0.6657801579008475
key: train_jcc
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
0.99014778 0.99014778 0.99507389 0.99014778]
mean value: 0.9886382480612593
MCC on Blind test: 0.62
Accuracy on Blind test: 0.81
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.69311619 0.70693564 0.67376351 0.71329546 0.67440891 0.70312452
0.67457604 0.7066412 0.69384813 0.67763829]
mean value: 0.691734790802002
key: score_time
value: [0.00995064 0.00967097 0.01044416 0.00957108 0.014184 0.01022768
0.00957584 0.0112319 0.01048136 0.01039362]
mean value: 0.010573124885559082
key: test_mcc
value: [0.82213439 0.82506438 0.95643752 1. 0.82574419 0.91485328
0.91485328 0.82213439 1. 0.95643752]
mean value: 0.9037658939474779
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.91111111 0.97777778 1. 0.91111111 0.95555556
0.95555556 0.91111111 1. 0.97777778]
mean value: 0.9511111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91304348 0.91666667 0.9787234 1. 0.90909091 0.95652174
0.95652174 0.90909091 1. 0.97674419]
mean value: 0.9516403031672055
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.88 0.95833333 1. 0.95238095 0.91666667
0.91666667 0.90909091 1. 1. ]
mean value: 0.9446182006399397
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91304348 0.95652174 1. 1. 0.86956522 1.
1. 0.90909091 1. 0.95454545]
mean value: 0.9602766798418972
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91106719 0.91007905 0.97727273 1. 0.91205534 0.95652174
0.95652174 0.91106719 1. 0.97727273]
mean value: 0.9511857707509881
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84 0.84615385 0.95833333 1. 0.83333333 0.91666667
0.91666667 0.83333333 1. 0.95454545]
mean value: 0.9099032634032634
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.96
Accuracy on Blind test: 0.98
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.1388762 0.1777842 0.13639593 0.05101848 0.0506146 0.0657537
0.11698508 0.15806198 0.03034258 0.0545578 ]
mean value: 0.09803905487060546
key: score_time
value: [0.01342249 0.02337122 0.01860356 0.03219318 0.02325344 0.01318908
0.01368833 0.02427459 0.01500654 0.0172019 ]
mean value: 0.019420433044433593
key: test_mcc
value: [0.60000118 0.55666994 0.38019877 0.56261436 0.22004311 0.2903816
0.21191154 0.24356483 0.5216284 0.33797818]
mean value: 0.3924991910418209
key: train_mcc
value: [0.9901234 0.97541644 0.99017145 0.98519693 0.72864068 0.76507358
0.93772687 0.89576137 0.98529376 0.78773172]
mean value: 0.9041136193904215
key: test_accuracy
value: [0.8 0.77777778 0.68888889 0.77777778 0.6 0.64444444
0.6 0.62222222 0.75555556 0.66666667]
mean value: 0.6933333333333334
key: train_accuracy
value: [0.99506173 0.98765432 0.99506173 0.99259259 0.84691358 0.8691358
0.96790123 0.94567901 0.99259259 0.89382716]
mean value: 0.9486419753086419
key: test_fscore
value: [0.80851064 0.79166667 0.68181818 0.8 0.52631579 0.6
0.64 0.60465116 0.71794872 0.68085106]
mean value: 0.6851762220825608
key: train_fscore
value: [0.9950495 0.98771499 0.99502488 0.99255583 0.81871345 0.84985836
0.96897375 0.94300518 0.99255583 0.89486553]
mean value: 0.9438317292087527
key: test_precision
value: [0.79166667 0.76 0.71428571 0.74074074 0.66666667 0.66666667
0.57142857 0.61904762 0.82352941 0.64 ]
mean value: 0.6994032057267351
key: train_precision
value: [0.9950495 0.9804878 1. 0.99502488 1. 1.
0.93981481 0.99453552 1. 0.88834951]
mean value: 0.9793262033954039
key: test_recall
value: [0.82608696 0.82608696 0.65217391 0.86956522 0.43478261 0.54545455
0.72727273 0.59090909 0.63636364 0.72727273]
mean value: 0.6835968379446641
key: train_recall
value: [0.9950495 0.9950495 0.99009901 0.99009901 0.69306931 0.73891626
1. 0.89655172 0.98522167 0.90147783]
mean value: 0.9185533824318393
key: test_roc_auc
value: [0.79940711 0.77667984 0.68972332 0.7756917 0.60375494 0.64229249
0.6027668 0.6215415 0.75296443 0.66798419]
mean value: 0.6932806324110672
key: train_roc_auc
value: [0.9950617 0.98767254 0.9950495 0.99258645 0.84653465 0.86945813
0.96782178 0.94580061 0.99261084 0.89380822]
mean value: 0.9486404428620202
key: test_jcc
value: [0.67857143 0.65517241 0.51724138 0.66666667 0.35714286 0.42857143
0.47058824 0.43333333 0.56 0.51612903]
mean value: 0.5283416774941345
key: train_jcc
value: [0.99014778 0.97572816 0.99009901 0.98522167 0.69306931 0.73891626
0.93981481 0.89215686 0.98522167 0.80973451]
mean value: 0.90001100521683
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.05427146 0.05056667 0.02967739 0.04050016 0.04019976 0.04014754
0.04688692 0.03133988 0.03438783 0.03365636]
mean value: 0.040163397789001465
key: score_time
value: [0.02168655 0.02730417 0.03774905 0.0235827 0.02416897 0.02359438
0.0278523 0.02273226 0.02274179 0.02754211]
mean value: 0.02589542865753174
key: test_mcc
value: [0.82574419 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
0.73663511 0.68911026 0.95652174 0.77821935]
mean value: 0.7965770525024884
key: train_mcc
value: [0.85731376 0.86693826 0.88152664 0.86177295 0.89175679 0.87164354
0.871768 0.871768 0.86176621 0.85221434]
mean value: 0.8688468510791374
key: test_accuracy
value: [0.91111111 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
0.86666667 0.84444444 0.97777778 0.88888889]
mean value: 0.8977777777777778
key: train_accuracy
value: [0.92839506 0.93333333 0.94074074 0.9308642 0.94567901 0.93580247
0.93580247 0.93580247 0.9308642 0.92592593]
mean value: 0.934320987654321
key: test_fscore
value: [0.90909091 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
0.86956522 0.8372093 0.97777778 0.88372093]
mean value: 0.8978648955401747
key: train_fscore
value: [0.92944039 0.93398533 0.9408867 0.93103448 0.94634146 0.93627451
0.93658537 0.93658537 0.93137255 0.92718447]
mean value: 0.9349690621598662
key: test_precision
value: [0.95238095 0.875 0.91666667 0.86956522 0.91304348 0.86956522
0.83333333 0.85714286 0.95652174 0.9047619 ]
mean value: 0.8947981366459627
key: train_precision
value: [0.9138756 0.92270531 0.93627451 0.92647059 0.93269231 0.93170732
0.92753623 0.92753623 0.92682927 0.9138756 ]
mean value: 0.9259502965047404
key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
0.90909091 0.81818182 1. 0.86363636]
mean value: 0.9021739130434783
key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867
0.94581281 0.94581281 0.93596059 0.9408867 ]
mean value: 0.9442032873238062
key: test_roc_auc
value: [0.91205534 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
0.86758893 0.84387352 0.97826087 0.88833992]
mean value: 0.8978260869565218
key: train_roc_auc
value: [0.9284373 0.93336341 0.94075257 0.93087597 0.94571526 0.93578988
0.93577769 0.93577769 0.93085158 0.92588889]
mean value: 0.934323025898649
key: test_jcc
value: [0.83333333 0.80769231 0.88 0.76923077 0.84 0.8
0.76923077 0.72 0.95652174 0.79166667]
mean value: 0.8167675585284281
key: train_jcc
value: [0.86818182 0.87614679 0.88837209 0.87096774 0.89814815 0.88018433
0.88073394 0.88073394 0.87155963 0.86425339]
mean value: 0.8779281838677705
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.43447208 0.54856634 0.45230627 0.30222797 0.46330261 0.58453083
0.51913071 0.37581491 0.53456688 0.40650988]
mean value: 0.4621428489685059
key: score_time
value: [0.02360606 0.03693914 0.02723217 0.01753807 0.02427649 0.02521634
0.02481031 0.03186703 0.04090238 0.03361082]
mean value: 0.028599882125854494
key: test_mcc
value: [0.82574419 0.77821935 0.86732843 0.77821935 0.82213439 0.78530224
0.73663511 0.64613475 0.95652174 0.77821935]
mean value: 0.7974458893590411
key: train_mcc
value: [0.85731376 0.86693826 0.88152664 0.90618217 0.93581427 0.91606106
0.80250226 0.92620337 0.86176621 0.85221434]
mean value: 0.8806522363523859
key: test_accuracy
value: [0.91111111 0.88888889 0.93333333 0.88888889 0.91111111 0.88888889
0.86666667 0.82222222 0.97777778 0.88888889]
mean value: 0.8977777777777778
key: train_accuracy
value: [0.92839506 0.93333333 0.94074074 0.95308642 0.96790123 0.95802469
0.90123457 0.96296296 0.9308642 0.92592593]
mean value: 0.9402469135802469
key: test_fscore
value: [0.90909091 0.89361702 0.93617021 0.89361702 0.91304348 0.89361702
0.86956522 0.80952381 0.97777778 0.88372093]
mean value: 0.8979743398872972
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.92944039 0.93398533 0.9408867 0.9528536 0.96790123 0.95802469
0.90196078 0.96350365 0.93137255 0.92718447]
mean value: 0.9407113391803744
key: test_precision
value: [0.95238095 0.875 0.91666667 0.875 0.91304348 0.84
0.83333333 0.85 0.95652174 0.9047619 ]
mean value: 0.8916708074534161
key: train_precision
value: [0.9138756 0.92270531 0.93627451 0.95522388 0.96551724 0.96039604
0.89756098 0.95192308 0.92682927 0.9138756 ]
mean value: 0.9344181502391634
key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.91304348 0.91304348 0.95454545
0.90909091 0.77272727 1. 0.86363636]
mean value: 0.9065217391304348
key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.95049505 0.97029703 0.95566502
0.90640394 0.97536946 0.93596059 0.9408867 ]
mean value: 0.9471711456859971
key: test_roc_auc
value: [0.91205534 0.88833992 0.93280632 0.88833992 0.91106719 0.89031621
0.86758893 0.82114625 0.97826087 0.88833992]
mean value: 0.8978260869565218
key: train_roc_auc
value: [0.9284373 0.93336341 0.94075257 0.95308004 0.96790714 0.95803053
0.90122177 0.96293225 0.93085158 0.92588889]
mean value: 0.9402465492854705
key: test_jcc
value: [0.83333333 0.80769231 0.88 0.80769231 0.84 0.80769231
0.76923077 0.68 0.95652174 0.79166667]
mean value: 0.8173829431438128
key: train_jcc
value: [0.86818182 0.87614679 0.88837209 0.90995261 0.93779904 0.91943128
0.82142857 0.92957746 0.87155963 0.86425339]
mean value: 0.888670269242401
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04649901 0.10190964 0.07873082 0.10802031 0.04602194 0.10666394
0.04753137 0.04401898 0.05204272 0.05098581]
mean value: 0.06824245452880859
key: score_time
value: [0.01550841 0.02776861 0.01159406 0.01329851 0.01062369 0.01794052
0.01888156 0.01782799 0.01066375 0.0106461 ]
mean value: 0.015475320816040038
key: test_mcc
value: [0.86452993 0.77352678 0.77352678 0.6882472 0.86452993 0.68252363
0.77352678 0.90909091 0.81818182 0.95553309]
mean value: 0.8103216868800538
key: train_mcc
value: [0.85858586 0.88929729 0.87383768 0.86873119 0.86391186 0.87374852
0.86373551 0.85876112 0.85354624 0.85380763]
mean value: 0.8657962897237084
key: test_accuracy
value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909
0.88636364 0.95454545 0.90909091 0.97727273]
mean value: 0.9045454545454545
key: train_accuracy
value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869
0.93181818 0.92929293 0.92676768 0.92676768]
mean value: 0.9328282828282828
key: test_fscore
value: [0.93333333 0.88372093 0.88372093 0.85106383 0.93023256 0.84444444
0.88372093 0.95454545 0.90909091 0.97777778]
mean value: 0.9051651097816362
key: train_fscore
value: [0.92929293 0.94527363 0.93734336 0.93467337 0.93266833 0.93702771
0.93233083 0.93 0.92695214 0.9276808 ]
mean value: 0.9333243089480099
key: test_precision
value: [0.91304348 0.9047619 0.9047619 0.8 0.95238095 0.82608696
0.9047619 0.95454545 0.90909091 0.95652174]
mean value: 0.9025955204216074
key: train_precision
value: [0.92929293 0.93137255 0.93034826 0.93 0.92118227 0.93467337
0.92537313 0.92079208 0.92462312 0.91625616]
mean value: 0.9263913856612664
key: test_recall
value: [0.95454545 0.86363636 0.86363636 0.90909091 0.90909091 0.86363636
0.86363636 0.95454545 0.90909091 1. ]
mean value: 0.9090909090909091
key: train_recall
value: [0.92929293 0.95959596 0.94444444 0.93939394 0.94444444 0.93939394
0.93939394 0.93939394 0.92929293 0.93939394]
mean value: 0.9404040404040405
key: test_roc_auc
value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909
0.88636364 0.95454545 0.90909091 0.97727273]
mean value: 0.9045454545454545
key: train_roc_auc
value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869
0.93181818 0.92929293 0.92676768 0.92676768]
mean value: 0.9328282828282828
key: test_jcc
value: [0.875 0.79166667 0.79166667 0.74074074 0.86956522 0.73076923
0.79166667 0.91304348 0.83333333 0.95652174]
mean value: 0.8293973739625913
key: train_jcc
value: [0.86792453 0.89622642 0.88207547 0.87735849 0.87383178 0.88151659
0.87323944 0.86915888 0.86384977 0.86511628]
mean value: 0.8750297628491411
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [2.08121514 2.80702353 3.87393689 2.53061247 2.8521018 2.31932235
3.42499232 2.182899 3.24924803 2.97179842]
mean value: 2.8293149948120115
key: score_time
value: [0.01505876 0.01849389 0.02364302 0.03510594 0.01178694 0.0123539
0.03271341 0.02871919 0.02154374 0.01550484]
mean value: 0.021492362022399902
key: test_mcc
value: [0.86452993 0.77352678 0.81818182 0.68252363 0.82158384 0.68252363
0.77352678 0.90909091 0.81818182 0.95553309]
mean value: 0.8099202235722511
key: train_mcc
value: [0.81822356 0.84865804 0.88393985 0.8939508 0.82866339 0.82832509
0.89903576 0.88393985 0.89903576 0.88393985]
mean value: 0.8667711957739941
key: test_accuracy
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909
0.88636364 0.95454545 0.90909091 0.97727273]
mean value: 0.9045454545454545
key: train_accuracy
value: [0.90909091 0.92424242 0.94191919 0.9469697 0.91414141 0.91414141
0.94949495 0.94191919 0.94949495 0.94191919]
mean value: 0.9333333333333333
key: test_fscore
value: [0.93333333 0.88372093 0.90909091 0.84444444 0.9047619 0.84444444
0.88372093 0.95454545 0.90909091 0.97777778]
mean value: 0.9044931037954294
key: train_fscore
value: [0.90954774 0.925 0.94235589 0.94710327 0.91542289 0.91457286
0.94974874 0.94235589 0.94974874 0.94235589]
mean value: 0.9338211919756527
key: test_precision
value: [0.91304348 0.9047619 0.90909091 0.82608696 0.95 0.82608696
0.9047619 0.95454545 0.90909091 0.95652174]
mean value: 0.9053990212685865
key: train_precision
value: [0.905 0.91584158 0.93532338 0.94472362 0.90196078 0.91
0.945 0.93532338 0.945 0.93532338]
mean value: 0.9273496135816325
key: test_recall
value: [0.95454545 0.86363636 0.90909091 0.86363636 0.86363636 0.86363636
0.86363636 0.95454545 0.90909091 1. ]
mean value: 0.9045454545454545
key: train_recall
value: [0.91414141 0.93434343 0.94949495 0.94949495 0.92929293 0.91919192
0.95454545 0.94949495 0.95454545 0.94949495]
mean value: 0.9404040404040405
key: test_roc_auc
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909
0.88636364 0.95454545 0.90909091 0.97727273]
mean value: 0.9045454545454545
key: train_roc_auc
value: [0.90909091 0.92424242 0.94191919 0.9469697 0.91414141 0.91414141
0.94949495 0.94191919 0.94949495 0.94191919]
mean value: 0.9333333333333333
key: test_jcc
value: [0.875 0.79166667 0.83333333 0.73076923 0.82608696 0.73076923
0.79166667 0.91304348 0.83333333 0.95652174]
mean value: 0.8282190635451505
key: train_jcc
value: [0.83410138 0.86046512 0.89099526 0.89952153 0.8440367 0.84259259
0.90430622 0.89099526 0.90430622 0.89099526]
mean value: 0.8762315541890235
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01285553 0.01226926 0.01220226 0.01206255 0.01200318 0.01196909
0.01202083 0.01223373 0.01212716 0.01206827]
mean value: 0.01218118667602539
key: score_time
value: [0.0109396 0.01060057 0.01065397 0.01066351 0.01072145 0.01063704
0.01068616 0.01052976 0.01061583 0.01066804]
mean value: 0.010671591758728028
key: test_mcc
value: [0.72727273 0.46225016 0.62330229 0.6882472 0.5547002 0.45454545
0.60678804 0.68252363 0.6882472 0.6882472 ]
mean value: 0.6176124107532563
key: train_mcc
value: [0.6873189 0.70131223 0.65677139 0.73180407 0.66882888 0.71147617
0.6771364 0.6771364 0.66144272 0.6724898 ]
mean value: 0.6845716943284235
key: test_accuracy
value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273
0.79545455 0.84090909 0.84090909 0.84090909]
mean value: 0.8045454545454546
key: train_accuracy
value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535
0.83585859 0.83585859 0.82828283 0.83333333]
mean value: 0.8396464646464646
key: test_fscore
value: [0.86363636 0.7 0.75675676 0.82926829 0.75 0.72727273
0.76923077 0.8372093 0.82926829 0.82926829]
mean value: 0.7891910797270979
key: train_fscore
value: [0.83018868 0.83957219 0.81401617 0.85561497 0.81743869 0.84491979
0.82479784 0.82479784 0.8172043 0.82162162]
mean value: 0.8290172105750199
key: test_precision
value: [0.86363636 0.77777778 0.93333333 0.89473684 0.83333333 0.72727273
0.88235294 0.85714286 0.89473684 0.89473684]
mean value: 0.8559059859988652
key: train_precision
value: [0.89017341 0.89204545 0.87283237 0.90909091 0.88757396 0.89772727
0.88439306 0.88439306 0.87356322 0.88372093]
mean value: 0.8875513656998492
key: test_recall
value: [0.86363636 0.63636364 0.63636364 0.77272727 0.68181818 0.72727273
0.68181818 0.81818182 0.77272727 0.77272727]
mean value: 0.7363636363636363
key: train_recall
value: [0.77777778 0.79292929 0.76262626 0.80808081 0.75757576 0.7979798
0.77272727 0.77272727 0.76767677 0.76767677]
mean value: 0.7777777777777778
key: test_roc_auc
value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273
0.79545455 0.84090909 0.84090909 0.84090909]
mean value: 0.8045454545454546
key: train_roc_auc
value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535
0.83585859 0.83585859 0.82828283 0.83333333]
mean value: 0.8396464646464646
key: test_jcc
value: [0.76 0.53846154 0.60869565 0.70833333 0.6 0.57142857
0.625 0.72 0.70833333 0.70833333]
mean value: 0.6548585762064023
key: train_jcc
value: [0.70967742 0.7235023 0.68636364 0.74766355 0.69124424 0.73148148
0.70183486 0.70183486 0.69090909 0.69724771]
mean value: 0.7081759154482379
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01238537 0.01245689 0.01242876 0.01219463 0.01234508 0.01249957
0.01238346 0.01238561 0.01246309 0.01236248]
mean value: 0.012390494346618652
key: score_time
value: [0.01070952 0.01077819 0.01080084 0.01070762 0.01050544 0.01057076
0.0105443 0.01068759 0.01069641 0.01076317]
mean value: 0.010676383972167969
key: test_mcc
value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733
0.63636364 0.77352678 0.77352678 0.77352678]
mean value: 0.6601449963922943
key: train_mcc
value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371
0.74243371 0.72230514 0.75253485 0.7577304 ]
mean value: 0.7383597900292419
key: test_accuracy
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
0.81818182 0.88636364 0.88636364 0.88636364]
mean value: 0.8295454545454546
key: train_accuracy
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
0.87121212 0.86111111 0.87626263 0.87878788]
mean value: 0.8689393939393939
key: test_fscore
value: [0.91304348 0.63414634 0.90909091 0.8 0.79069767 0.75555556
0.81818182 0.88372093 0.88888889 0.88888889]
mean value: 0.8282214484981508
key: train_fscore
value: [0.87218045 0.83377309 0.87841191 0.89 0.84895833 0.87088608
0.87088608 0.86005089 0.87657431 0.88 ]
mean value: 0.868172113199113
key: test_precision
value: [0.875 0.68421053 0.90909091 0.7826087 0.80952381 0.73913043
0.81818182 0.9047619 0.86956522 0.86956522]
mean value: 0.8261638533091622
key: train_precision
value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645
0.87309645 0.86666667 0.87437186 0.87128713]
mean value: 0.8718065205643388
key: test_recall
value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727
0.81818182 0.86363636 0.90909091 0.90909091]
mean value: 0.8318181818181818
key: train_recall
value: [0.87878788 0.7979798 0.89393939 0.8989899 0.82323232 0.86868687
0.86868687 0.85353535 0.87878788 0.88888889]
mean value: 0.8651515151515151
key: test_roc_auc
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
0.81818182 0.88636364 0.88636364 0.88636364]
mean value: 0.8295454545454546
key: train_roc_auc
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
0.87121212 0.86111111 0.87626263 0.87878788]
mean value: 0.8689393939393939
key: test_jcc
value: [0.84 0.46428571 0.83333333 0.66666667 0.65384615 0.60714286
0.69230769 0.79166667 0.8 0.8 ]
mean value: 0.7149249084249084
key: train_jcc
value: [0.77333333 0.71493213 0.78318584 0.8018018 0.73755656 0.77130045
0.77130045 0.75446429 0.78026906 0.78571429]
mean value: 0.7673858190211428
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01081586 0.01136518 0.01138735 0.01129532 0.01170659 0.01099205
0.01889086 0.01352763 0.03923583 0.01395297]
mean value: 0.015316963195800781
key: score_time
value: [0.01401186 0.03292465 0.02555895 0.02621388 0.0230813 0.01509142
0.07618952 0.06447315 0.09006906 0.05066252]
mean value: 0.0418276309967041
key: test_mcc
value: [0.59648091 0.50051733 0.54545455 0.50471461 0.27386128 0.47245559
0.60678804 0.50051733 0.54772256 0.5547002 ]
mean value: 0.510321238961513
key: train_mcc
value: [0.68718427 0.69199863 0.68700889 0.66697297 0.70710678 0.71366109
0.70739557 0.67275618 0.66182722 0.66670068]
mean value: 0.6862612279819491
key: test_accuracy
value: [0.79545455 0.75 0.77272727 0.75 0.63636364 0.72727273
0.79545455 0.75 0.77272727 0.77272727]
mean value: 0.7522727272727272
key: train_accuracy
value: [0.84343434 0.8459596 0.84343434 0.83333333 0.85353535 0.85606061
0.85353535 0.83585859 0.83080808 0.83333333]
mean value: 0.8429292929292929
key: test_fscore
value: [0.80851064 0.74418605 0.77272727 0.73170732 0.61904762 0.68421053
0.76923077 0.75555556 0.7826087 0.75 ]
mean value: 0.7417784440411851
key: train_fscore
value: [0.84102564 0.84711779 0.84183673 0.83076923 0.85279188 0.85117493
0.85128205 0.83116883 0.8286445 0.83248731]
mean value: 0.8408298907247728
key: test_precision
value: [0.76 0.76190476 0.77272727 0.78947368 0.65 0.8125
0.88235294 0.73913043 0.75 0.83333333]
mean value: 0.7751422428134973
key: train_precision
value: [0.85416667 0.84079602 0.85051546 0.84375 0.85714286 0.88108108
0.86458333 0.85561497 0.83937824 0.83673469]
mean value: 0.8523763327523514
key: test_recall
value: [0.86363636 0.72727273 0.77272727 0.68181818 0.59090909 0.59090909
0.68181818 0.77272727 0.81818182 0.68181818]
mean value: 0.7181818181818181
key: train_recall
value: [0.82828283 0.85353535 0.83333333 0.81818182 0.84848485 0.82323232
0.83838384 0.80808081 0.81818182 0.82828283]
mean value: 0.8297979797979798
key: test_roc_auc
value: [0.79545455 0.75 0.77272727 0.75 0.63636364 0.72727273
0.79545455 0.75 0.77272727 0.77272727]
mean value: 0.7522727272727273
key: train_roc_auc
value: [0.84343434 0.8459596 0.84343434 0.83333333 0.85353535 0.85606061
0.85353535 0.83585859 0.83080808 0.83333333]
mean value: 0.842929292929293
key: test_jcc
value: [0.67857143 0.59259259 0.62962963 0.57692308 0.44827586 0.52
0.625 0.60714286 0.64285714 0.6 ]
mean value: 0.5920992589785693
key: train_jcc
value: [0.72566372 0.73478261 0.72687225 0.71052632 0.74336283 0.74090909
0.74107143 0.71111111 0.70742358 0.71304348]
mean value: 0.7254766409492254
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0494976 0.01854587 0.01976705 0.01910186 0.01809573 0.01886225
0.01888537 0.0258832 0.02618337 0.02646255]
mean value: 0.024128484725952148
key: score_time
value: [0.03666711 0.01181078 0.01129031 0.0110836 0.01145768 0.01238108
0.01312828 0.015697 0.01585555 0.01609397]
mean value: 0.015546536445617676
key: test_mcc
value: [0.86452993 0.77352678 0.81818182 0.6882472 0.7800135 0.68252363
0.73029674 0.90909091 0.81818182 0.77352678]
mean value: 0.7838119120880734
key: train_mcc
value: [0.7979798 0.81322466 0.81322466 0.8133907 0.80824576 0.81314168
0.80812204 0.79814268 0.80812204 0.7979798 ]
mean value: 0.8071573819260374
key: test_accuracy
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909
0.86363636 0.95454545 0.90909091 0.88636364]
mean value: 0.8909090909090909
key: train_accuracy
value: [0.8989899 0.90656566 0.90656566 0.90656566 0.9040404 0.90656566
0.9040404 0.8989899 0.9040404 0.8989899 ]
mean value: 0.9035353535353535
key: test_fscore
value: [0.93333333 0.88372093 0.90909091 0.85106383 0.87804878 0.84444444
0.85714286 0.95454545 0.90909091 0.88888889]
mean value: 0.8909370337044393
key: train_fscore
value: [0.8989899 0.90726817 0.90726817 0.90537084 0.905 0.90632911
0.90452261 0.9 0.90452261 0.8989899 ]
mean value: 0.9038261322876402
key: test_precision
value: [0.91304348 0.9047619 0.90909091 0.8 0.94736842 0.82608696
0.9 0.95454545 0.90909091 0.86956522]
mean value: 0.8933553250715722
key: train_precision
value: [0.8989899 0.90049751 0.90049751 0.91709845 0.8960396 0.90862944
0.9 0.89108911 0.9 0.8989899 ]
mean value: 0.9011831422946928
key: test_recall
value: [0.95454545 0.86363636 0.90909091 0.90909091 0.81818182 0.86363636
0.81818182 0.95454545 0.90909091 0.90909091]
mean value: 0.8909090909090909
key: train_recall
value: [0.8989899 0.91414141 0.91414141 0.89393939 0.91414141 0.9040404
0.90909091 0.90909091 0.90909091 0.8989899 ]
mean value: 0.9065656565656566
key: test_roc_auc
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909
0.86363636 0.95454545 0.90909091 0.88636364]
mean value: 0.890909090909091
key: train_roc_auc
value: [0.8989899 0.90656566 0.90656566 0.90656566 0.9040404 0.90656566
0.9040404 0.8989899 0.9040404 0.8989899 ]
mean value: 0.9035353535353535
key: test_jcc
value: [0.875 0.79166667 0.83333333 0.74074074 0.7826087 0.73076923
0.75 0.91304348 0.83333333 0.8 ]
mean value: 0.8050495478756349
key: train_jcc
value: [0.81651376 0.83027523 0.83027523 0.8271028 0.82648402 0.8287037
0.82568807 0.81818182 0.82568807 0.81651376]
mean value: 0.8245426472329047
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.06339478 2.06864357 2.89429951 1.91644716 1.80405879 2.09717655
4.19174433 4.01769924 1.65347528 3.00509381]
mean value: 2.5712033033370973
key: score_time
value: [0.01534939 0.01772881 0.0145216 0.01324105 0.01330876 0.02885771
0.02873945 0.01542616 0.02645946 0.02501845]
mean value: 0.019865083694458007
key: test_mcc
value: [0.82158384 0.7800135 0.77352678 0.73029674 0.86452993 0.63900965
0.81818182 0.86452993 0.81818182 0.86452993]
mean value: 0.7974383949974557
key: train_mcc
value: [1. 0.99496218 1. 1. 0.99496218 1.
0.99496218 1. 1. 1. ]
mean value: 0.9984886553739265
key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182
0.90909091 0.93181818 0.90909091 0.93181818]
mean value: 0.8977272727272727
key: train_accuracy
value: [1. 0.99747475 1. 1. 0.99747475 1.
0.99747475 1. 1. 1. ]
mean value: 0.9992424242424243
key: test_fscore
value: [0.91304348 0.87804878 0.88888889 0.85714286 0.93023256 0.82608696
0.90909091 0.93023256 0.90909091 0.93023256]
mean value: 0.8972090453902583
key: train_fscore
value: [1. 0.99746835 1. 1. 0.99746835 1.
0.99746835 1. 1. 1. ]
mean value: 0.9992405063291139
key: test_precision
value: [0.875 0.94736842 0.86956522 0.9 0.95238095 0.79166667
0.90909091 0.95238095 0.90909091 0.95238095]
mean value: 0.9058924980435278
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.81818182 0.90909091 0.81818182 0.90909091 0.86363636
0.90909091 0.90909091 0.90909091 0.90909091]
mean value: 0.8909090909090909
key: train_recall
value: [1. 0.99494949 1. 1. 0.99494949 1.
0.99494949 1. 1. 1. ]
mean value: 0.9984848484848485
key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182
0.90909091 0.93181818 0.90909091 0.93181818]
mean value: 0.8977272727272728
key: train_roc_auc
value: [1. 0.99747475 1. 1. 0.99747475 1.
0.99747475 1. 1. 1. ]
mean value: 0.9992424242424243
key: test_jcc
value: [0.84 0.7826087 0.8 0.75 0.86956522 0.7037037
0.83333333 0.86956522 0.83333333 0.86956522]
mean value: 0.8151674718196458
key: train_jcc
value: [1. 0.99494949 1. 1. 0.99494949 1.
0.99494949 1. 1. 1. ]
mean value: 0.9984848484848485
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03948331 0.0421505 0.02812696 0.02692723 0.0463903 0.02427006
0.02560449 0.04395103 0.02709556 0.02732158]
mean value: 0.033132100105285646
key: score_time
value: [0.01248169 0.01416612 0.01280332 0.01256847 0.01260996 0.0124898
0.01261306 0.01800823 0.01289749 0.01290941]
mean value: 0.013354754447937012
key: test_mcc
value: [0.91287093 0.81818182 0.81818182 0.82158384 0.95553309 0.77352678
0.81818182 0.77352678 0.86452993 0.87177979]
mean value: 0.8427896597116962
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364
0.90909091 0.88636364 0.93181818 0.93181818]
mean value: 0.9204545454545454
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95238095 0.90909091 0.90909091 0.9047619 0.97674419 0.88888889
0.90909091 0.88372093 0.93023256 0.93617021]
mean value: 0.9200172360489035
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.90909091 0.90909091 0.95 1. 0.86956522
0.90909091 0.9047619 0.95238095 0.88 ]
mean value: 0.9283980801806888
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.90909091 0.90909091 0.86363636 0.95454545 0.90909091
0.90909091 0.86363636 0.90909091 1. ]
mean value: 0.9136363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364
0.90909091 0.88636364 0.93181818 0.93181818]
mean value: 0.9204545454545455
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90909091 0.83333333 0.83333333 0.82608696 0.95454545 0.8
0.83333333 0.79166667 0.86956522 0.88 ]
mean value: 0.8530955204216074
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.20858884 0.16574883 0.16682291 0.16697645 0.16598296 0.16459179
0.17711449 0.17074442 0.16750717 0.21235633]
mean value: 0.17664341926574706
key: score_time
value: [0.02438045 0.0246768 0.02469969 0.02466249 0.02455401 0.02451372
0.02488565 0.02486062 0.02491927 0.02723241]
mean value: 0.024938511848449706
key: test_mcc
value: [0.77352678 0.7800135 0.81818182 0.6882472 0.77352678 0.73960026
0.77352678 0.90909091 0.81818182 0.77352678]
mean value: 0.7847422639174152
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636
0.88636364 0.95454545 0.90909091 0.88636364]
mean value: 0.8909090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.87804878 0.90909091 0.85106383 0.88372093 0.875
0.88372093 0.95454545 0.90909091 0.88888889]
mean value: 0.8922059521245206
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86956522 0.94736842 0.90909091 0.8 0.9047619 0.80769231
0.9047619 0.95454545 0.90909091 0.86956522]
mean value: 0.8876442245778631
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.81818182 0.90909091 0.90909091 0.86363636 0.95454545
0.86363636 0.95454545 0.90909091 0.90909091]
mean value: 0.9
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636
0.88636364 0.95454545 0.90909091 0.88636364]
mean value: 0.890909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.7826087 0.83333333 0.74074074 0.79166667 0.77777778
0.79166667 0.91304348 0.83333333 0.8 ]
mean value: 0.8064170692431563
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.82
Accuracy on Blind test: 0.91
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0146656 0.02988982 0.01498175 0.01482368 0.01461172 0.01457262
0.01456785 0.01468992 0.01473713 0.01458812]
mean value: 0.016212821006774902
key: score_time
value: [0.01279306 0.0220902 0.02778959 0.02868342 0.01232696 0.01224971
0.01235223 0.01237464 0.01246667 0.01231074]
mean value: 0.01654372215270996
key: test_mcc
value: [0.45643546 0.36363636 0.50051733 0.31851103 0.41294832 0.36980013
0.63900965 0.63636364 0.50471461 0.54772256]
mean value: 0.4749659098162487
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.72727273 0.68181818 0.75 0.65909091 0.70454545 0.68181818
0.81818182 0.81818182 0.75 0.77272727]
mean value: 0.7363636363636363
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71428571 0.68181818 0.74418605 0.65116279 0.68292683 0.70833333
0.80952381 0.81818182 0.76595745 0.7826087 ]
mean value: 0.7358984666081136
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.68181818 0.76190476 0.66666667 0.73684211 0.65384615
0.85 0.81818182 0.72 0.75 ]
mean value: 0.738925968768074
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.68181818 0.68181818 0.72727273 0.63636364 0.63636364 0.77272727
0.77272727 0.81818182 0.81818182 0.81818182]
mean value: 0.7363636363636363
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.72727273 0.68181818 0.75 0.65909091 0.70454545 0.68181818
0.81818182 0.81818182 0.75 0.77272727]
mean value: 0.7363636363636363
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55555556 0.51724138 0.59259259 0.48275862 0.51851852 0.5483871
0.68 0.69230769 0.62068966 0.64285714]
mean value: 0.5850908253778109
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.52
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.39341021 2.50459886 1.97631717 2.46608806 2.39209723 2.58320475
2.78075337 2.49290299 3.35915875 2.54243302]
mean value: 2.549096441268921
key: score_time
value: [0.12756467 0.15043545 0.09386826 0.12860036 0.1271193 0.13112164
0.22609282 0.12711215 0.23179317 0.12944078]
mean value: 0.14731485843658448
key: test_mcc
value: [1. 0.91287093 0.90909091 0.82158384 0.86452993 0.82158384
0.86452993 1. 0.95553309 0.91287093]
mean value: 0.9062593395597373
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.95454545 0.95454545 0.90909091 0.93181818 0.90909091
0.93181818 1. 0.97727273 0.95454545]
mean value: 0.9522727272727273
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 0.95454545 0.91304348 0.93023256 0.91304348
0.93333333 1. 0.97674419 0.95652174]
mean value: 0.952984518009796
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.95454545 0.875 0.95238095 0.875
0.91304348 1. 1. 0.91666667]
mean value: 0.9486636551853943
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.95454545 0.95454545 0.90909091 0.95454545
0.95454545 1. 0.95454545 1. ]
mean value: 0.9590909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.95454545 0.95454545 0.90909091 0.93181818 0.90909091
0.93181818 1. 0.97727273 0.95454545]
mean value: 0.9522727272727273
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 0.91304348 0.84 0.86956522 0.84
0.875 1. 0.95454545 0.91666667]
mean value: 0.9117911725955204
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.94391346 0.95903611 0.93460608 0.94266438 0.93002152 0.95997882
1.03751016 0.9221313 0.98442721 0.95550966]
mean value: 0.9569798707962036
key: score_time
value: [0.15100384 0.11824775 0.21430612 0.18143821 0.22737098 0.19880962
0.22887874 0.20321155 0.23416042 0.13647127]
mean value: 0.18938984870910644
key: test_mcc
value: [1. 0.87177979 0.82158384 0.7800135 0.81818182 0.82158384
0.86452993 0.95553309 0.95553309 0.86452993]
mean value: 0.8753268816111682
key: train_mcc
value: [0.94445649 0.95465504 0.94949495 0.94954339 0.94954339 0.95465504
0.94954339 0.94445649 0.94949495 0.94949495]
mean value: 0.9495338084295475
key: test_accuracy
value: [1. 0.93181818 0.90909091 0.88636364 0.90909091 0.90909091
0.93181818 0.97727273 0.97727273 0.93181818]
mean value: 0.9363636363636363
key: train_accuracy
value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273
0.97474747 0.97222222 0.97474747 0.97474747]
mean value: 0.9747474747474747
key: test_fscore
value: [1. 0.92682927 0.9047619 0.89361702 0.90909091 0.91304348
0.93333333 0.97777778 0.97674419 0.93333333]
mean value: 0.9368531212173918
key: train_fscore
value: [0.9721519 0.97744361 0.97474747 0.97461929 0.97461929 0.97709924
0.97461929 0.9721519 0.97474747 0.97474747]
mean value: 0.9746946935394861
key: test_precision
value: [1. 1. 0.95 0.84 0.90909091 0.875
0.91304348 0.95652174 1. 0.91304348]
mean value: 0.9356699604743083
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: train_precision
value: [0.97461929 0.97014925 0.97474747 0.97959184 0.97959184 0.98461538
0.97959184 0.97461929 0.97474747 0.97474747]
mean value: 0.9767021151473436
key: test_recall
value: [1. 0.86363636 0.86363636 0.95454545 0.90909091 0.95454545
0.95454545 1. 0.95454545 0.95454545]
mean value: 0.9409090909090909
key: train_recall
value: [0.96969697 0.98484848 0.97474747 0.96969697 0.96969697 0.96969697
0.96969697 0.96969697 0.97474747 0.97474747]
mean value: 0.9727272727272728
key: test_roc_auc
value: [1. 0.93181818 0.90909091 0.88636364 0.90909091 0.90909091
0.93181818 0.97727273 0.97727273 0.93181818]
mean value: 0.9363636363636364
key: train_roc_auc
value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273
0.97474747 0.97222222 0.97474747 0.97474747]
mean value: 0.9747474747474747
key: test_jcc
value: [1. 0.86363636 0.82608696 0.80769231 0.83333333 0.84
0.875 0.95652174 0.95454545 0.875 ]
mean value: 0.8831816154859633
key: train_jcc
value: [0.94581281 0.95588235 0.95073892 0.95049505 0.95049505 0.95522388
0.95049505 0.94581281 0.95073892 0.95073892]
mean value: 0.9506433746585062
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01132703 0.01045918 0.01127529 0.01106215 0.01119161 0.01149607
0.01135707 0.01120472 0.01115823 0.01116991]
mean value: 0.011170125007629395
key: score_time
value: [0.00996137 0.0095737 0.01019239 0.00985885 0.00994515 0.01000214
0.01003671 0.01025009 0.01001143 0.0100646 ]
mean value: 0.009989643096923828
key: test_mcc
value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733
0.63636364 0.77352678 0.77352678 0.77352678]
mean value: 0.6601449963922943
key: train_mcc
value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371
0.74243371 0.72230514 0.75253485 0.7577304 ]
mean value: 0.7383597900292419
key: test_accuracy
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
0.81818182 0.88636364 0.88636364 0.88636364]
mean value: 0.8295454545454546
key: train_accuracy
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
0.87121212 0.86111111 0.87626263 0.87878788]
mean value: 0.8689393939393939
key: test_fscore
value: [0.91304348 0.63414634 0.90909091 0.8 0.79069767 0.75555556
0.81818182 0.88372093 0.88888889 0.88888889]
mean value: 0.8282214484981508
key: train_fscore
value: [0.87218045 0.83377309 0.87841191 0.89 0.84895833 0.87088608
0.87088608 0.86005089 0.87657431 0.88 ]
mean value: 0.868172113199113
key: test_precision
value: [0.875 0.68421053 0.90909091 0.7826087 0.80952381 0.73913043
0.81818182 0.9047619 0.86956522 0.86956522]
mean value: 0.8261638533091622
key: train_precision
value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645
0.87309645 0.86666667 0.87437186 0.87128713]
mean value: 0.8718065205643388
key: test_recall
value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727
0.81818182 0.86363636 0.90909091 0.90909091]
mean value: 0.8318181818181818
key: train_recall
value: [0.87878788 0.7979798 0.89393939 0.8989899 0.82323232 0.86868687
0.86868687 0.85353535 0.87878788 0.88888889]
mean value: 0.8651515151515151
key: test_roc_auc
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
0.81818182 0.88636364 0.88636364 0.88636364]
mean value: 0.8295454545454546
key: train_roc_auc
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
0.87121212 0.86111111 0.87626263 0.87878788]
mean value: 0.8689393939393939
key: test_jcc
value: [0.84 0.46428571 0.83333333 0.66666667 0.65384615 0.60714286
0.69230769 0.79166667 0.8 0.8 ]
mean value: 0.7149249084249084
key: train_jcc
value: [0.77333333 0.71493213 0.78318584 0.8018018 0.73755656 0.77130045
0.77130045 0.75446429 0.78026906 0.78571429]
mean value: 0.7673858190211428
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.60504842 1.49270439 1.54242682 1.5466218 2.61647606 0.22439003
1.24004292 1.27762127 0.6597116 1.25325251]
mean value: 1.3458295822143556
key: score_time
value: [0.01259804 0.01326776 0.0133779 0.01286626 0.01313877 0.01219416
0.01778555 0.01312637 0.01357436 0.01212931]
mean value: 0.013405847549438476
key: test_mcc
value: [1. 0.86452993 0.86452993 0.86452993 0.95553309 0.82158384
0.90909091 0.95553309 0.95553309 1. ]
mean value: 0.9190863807668139
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
0.95454545 0.97727273 0.97727273 1. ]
mean value: 0.9590909090909091
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93023256 0.93333333 0.93023256 0.97674419 0.91304348
0.95454545 0.97674419 0.97674419 1. ]
mean value: 0.9591619940558263
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.95238095 0.91304348 0.95238095 1. 0.875
0.95454545 1. 1. 1. ]
mean value: 0.9647350837568229
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.95454545 0.90909091 0.95454545 0.95454545
0.95454545 0.95454545 0.95454545 1. ]
mean value: 0.9545454545454546
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
0.95454545 0.97727273 0.97727273 1. ]
mean value: 0.9590909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.86956522 0.875 0.86956522 0.95454545 0.84
0.91304348 0.95454545 0.95454545 1. ]
mean value: 0.9230810276679842
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0410018 0.07751536 0.08132076 0.07177877 0.071033 0.06880021
0.0640676 0.07675624 0.0497086 0.07524252]
mean value: 0.067722487449646
key: score_time
value: [0.01247358 0.02159452 0.02081919 0.01254892 0.02184677 0.01257586
0.03190947 0.01236224 0.0167625 0.01244116]
mean value: 0.017533421516418457
key: test_mcc
value: [0.68252363 0.87177979 0.77352678 0.68252363 0.87177979 0.6882472
0.72727273 0.86452993 0.81818182 0.90909091]
mean value: 0.7889456217849123
key: train_mcc
value: [0.92434853 0.91937955 0.89903576 0.91415307 0.92434853 0.93435535
0.92948262 0.90909091 0.90414419 0.90913729]
mean value: 0.9167475808639807
key: test_accuracy
value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909
0.86363636 0.93181818 0.90909091 0.95454545]
mean value: 0.8931818181818182
key: train_accuracy
value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172
0.96464646 0.95454545 0.9520202 0.95454545]
mean value: 0.9583333333333334
key: test_fscore
value: [0.84444444 0.92682927 0.88888889 0.84444444 0.92682927 0.85106383
0.86363636 0.93333333 0.90909091 0.95454545]
mean value: 0.8943106204756438
key: train_fscore
value: [0.96240602 0.96 0.94974874 0.95717884 0.96240602 0.96708861
0.965 0.95454545 0.95238095 0.95477387]
mean value: 0.9585528498971682
key: test_precision
value: [0.82608696 1. 0.86956522 0.82608696 1. 0.8
0.86363636 0.91304348 0.90909091 0.95454545]
mean value: 0.896205533596838
key: train_precision
value: [0.95522388 0.95049505 0.945 0.95477387 0.95522388 0.96954315
0.95544554 0.95454545 0.94527363 0.95 ]
mean value: 0.9535524458194542
key: test_recall
value: [0.86363636 0.86363636 0.90909091 0.86363636 0.86363636 0.90909091
0.86363636 0.95454545 0.90909091 0.95454545]
mean value: 0.8954545454545455
key: train_recall
value: [0.96969697 0.96969697 0.95454545 0.95959596 0.96969697 0.96464646
0.97474747 0.95454545 0.95959596 0.95959596]
mean value: 0.9636363636363636
key: test_roc_auc
value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909
0.86363636 0.93181818 0.90909091 0.95454545]
mean value: 0.8931818181818182
key: train_roc_auc
value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172
0.96464646 0.95454545 0.9520202 0.95454545]
mean value: 0.9583333333333334
key: test_jcc
value: [0.73076923 0.86363636 0.8 0.73076923 0.86363636 0.74074074
0.76 0.875 0.83333333 0.91304348]
mean value: 0.8110928741146133
key: train_jcc
value: [0.92753623 0.92307692 0.90430622 0.9178744 0.92753623 0.93627451
0.93236715 0.91304348 0.90909091 0.91346154]
mean value: 0.9204567588451691
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01386571 0.01546836 0.01592708 0.01628208 0.01617885 0.0131793
0.01606131 0.0107379 0.01122975 0.01280212]
mean value: 0.014173245429992676
key: score_time
value: [0.01187468 0.01450491 0.01421571 0.01432848 0.01400518 0.01220989
0.0135181 0.00963759 0.00962281 0.01071024]
mean value: 0.012462759017944336
key: test_mcc
value: [0.81818182 0.50051733 0.81818182 0.63900965 0.50471461 0.54772256
0.6882472 0.81818182 0.86452993 0.86452993]
mean value: 0.7063816679047357
key: train_mcc
value: [0.74751288 0.68774638 0.76286954 0.77297377 0.67365307 0.73771253
0.77281598 0.72786709 0.73308094 0.72786709]
mean value: 0.7344099272046745
key: test_accuracy
value: [0.90909091 0.75 0.90909091 0.81818182 0.75 0.77272727
0.84090909 0.90909091 0.93181818 0.93181818]
mean value: 0.8522727272727273
key: train_accuracy
value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687
0.88636364 0.86363636 0.86616162 0.86363636]
mean value: 0.8669191919191919
key: test_fscore
value: [0.90909091 0.74418605 0.90909091 0.82608696 0.73170732 0.76190476
0.82926829 0.90909091 0.93333333 0.93333333]
mean value: 0.8487092768633621
key: train_fscore
value: [0.87309645 0.83937824 0.8797954 0.88491049 0.82939633 0.86666667
0.88549618 0.86082474 0.8630491 0.86082474]
mean value: 0.8643438322870827
key: test_precision
value: [0.90909091 0.76190476 0.90909091 0.79166667 0.78947368 0.8
0.89473684 0.90909091 0.91304348 0.91304348]
mean value: 0.8591141638681684
key: train_precision
value: [0.87755102 0.86170213 0.89119171 0.89637306 0.86338798 0.88020833
0.89230769 0.87894737 0.88359788 0.87894737]
mean value: 0.8804214539130206
key: test_recall
value: [0.90909091 0.72727273 0.90909091 0.86363636 0.68181818 0.72727273
0.77272727 0.90909091 0.95454545 0.95454545]
mean value: 0.8409090909090909
key: train_recall
value: [0.86868687 0.81818182 0.86868687 0.87373737 0.7979798 0.85353535
0.87878788 0.84343434 0.84343434 0.84343434]
mean value: 0.848989898989899
key: test_roc_auc
value: [0.90909091 0.75 0.90909091 0.81818182 0.75 0.77272727
0.84090909 0.90909091 0.93181818 0.93181818]
mean value: 0.8522727272727273
key: train_roc_auc
value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687
0.88636364 0.86363636 0.86616162 0.86363636]
mean value: 0.8669191919191919
key: test_jcc
value: [0.83333333 0.59259259 0.83333333 0.7037037 0.57692308 0.61538462
0.70833333 0.83333333 0.875 0.875 ]
mean value: 0.7446937321937322
key: train_jcc
value: [0.77477477 0.72321429 0.78538813 0.79357798 0.70852018 0.76470588
0.79452055 0.75565611 0.75909091 0.75565611]
mean value: 0.7615104905950141
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01290369 0.02171993 0.0183773 0.02184057 0.02121806 0.01895308
0.01816034 0.01787543 0.02349734 0.01725245]
mean value: 0.019179821014404297
key: score_time
value: [0.00926185 0.01158953 0.0118134 0.01216078 0.01294613 0.01253366
0.01261091 0.01218534 0.01236916 0.01217318]
mean value: 0.01196439266204834
key: test_mcc
value: [0.54794903 0.77352678 0.62330229 0.64715023 0.58554004 0.32539569
0.77352678 0.91287093 0.77352678 0.79349205]
mean value: 0.6756280609159855
key: train_mcc
value: [0.73305263 0.92036649 0.81060226 0.8786935 0.5976219 0.57346234
0.87374852 0.82790197 0.86140292 0.73125738]
mean value: 0.7808109898653016
key: test_accuracy
value: [0.75 0.88636364 0.79545455 0.81818182 0.77272727 0.63636364
0.88636364 0.95454545 0.88636364 0.88636364]
mean value: 0.8272727272727273
key: train_accuracy
value: [0.8510101 0.95959596 0.90151515 0.93686869 0.76767677 0.74747475
0.93686869 0.91161616 0.92929293 0.85606061]
mean value: 0.8797979797979798
key: test_fscore
value: [0.68571429 0.88372093 0.82352941 0.8 0.80769231 0.5
0.88372093 0.95238095 0.88888889 0.89795918]
mean value: 0.8123606890579727
key: train_fscore
value: [0.8259587 0.95854922 0.90780142 0.93333333 0.80991736 0.66216216
0.93670886 0.90666667 0.93203883 0.8707483 ]
mean value: 0.8743884855867281
key: test_precision
value: [0.92307692 0.9047619 0.72413793 0.88888889 0.7 0.8
0.9047619 1. 0.86956522 0.81481481]
mean value: 0.8530007584730224
key: train_precision
value: [0.9929078 0.98404255 0.85333333 0.98870056 0.68531469 1.
0.93908629 0.96045198 0.89719626 0.79012346]
mean value: 0.9091156928519439
key: test_recall
value: [0.54545455 0.86363636 0.95454545 0.72727273 0.95454545 0.36363636
0.86363636 0.90909091 0.90909091 1. ]
mean value: 0.8090909090909091
key: train_recall
value: [0.70707071 0.93434343 0.96969697 0.88383838 0.98989899 0.49494949
0.93434343 0.85858586 0.96969697 0.96969697]
mean value: 0.8712121212121212
key: test_roc_auc
value: [0.75 0.88636364 0.79545455 0.81818182 0.77272727 0.63636364
0.88636364 0.95454545 0.88636364 0.88636364]
mean value: 0.8272727272727273
key: train_roc_auc
value: [0.8510101 0.95959596 0.90151515 0.93686869 0.76767677 0.74747475
0.93686869 0.91161616 0.92929293 0.85606061]
mean value: 0.8797979797979798
key: test_jcc
value: [0.52173913 0.79166667 0.7 0.66666667 0.67741935 0.33333333
0.79166667 0.90909091 0.8 0.81481481]
mean value: 0.7006397542512549
key: train_jcc
value: [0.70351759 0.92039801 0.83116883 0.875 0.68055556 0.49494949
0.88095238 0.82926829 0.87272727 0.77108434]
mean value: 0.7859621763275807
MCC on Blind test: 0.73
Accuracy on Blind test: 0.86
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02216911 0.01927686 0.01920557 0.02066255 0.01956725 0.01928711
0.01948738 0.02105021 0.02236152 0.0231297 ]
mean value: 0.020619726181030272
key: score_time
value: [0.01260042 0.01255798 0.01213145 0.01228118 0.01218748 0.01229119
0.01216459 0.01225376 0.0122602 0.01242185]
mean value: 0.012315011024475098
key: test_mcc
value: [0.70014004 0.79349205 0.47140452 0.68252363 0.82158384 0.73029674
0.60678804 0.87177979 0.75592895 0.66143783]
mean value: 0.7095375421713689
key: train_mcc
value: [0.81587826 0.79415212 0.62017367 0.91471323 0.89002473 0.89940294
0.72894554 0.89180538 0.77045723 0.78127257]
mean value: 0.8106825677849934
key: test_accuracy
value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636
0.79545455 0.93181818 0.86363636 0.81818182]
mean value: 0.8431818181818181
key: train_accuracy
value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495
0.84848485 0.94444444 0.87373737 0.88131313]
mean value: 0.8967171717171717
key: test_fscore
value: [0.82051282 0.87179487 0.53333333 0.8372093 0.9047619 0.86956522
0.76923077 0.92682927 0.84210526 0.78947368]
mean value: 0.8164816435011689
key: train_fscore
value: [0.89196676 0.87640449 0.71428571 0.9562982 0.94300518 0.94871795
0.82248521 0.94210526 0.85632184 0.86685552]
mean value: 0.8818446131668011
key: test_precision
value: [0.94117647 1. 1. 0.85714286 0.95 0.83333333
0.88235294 1. 1. 0.9375 ]
mean value: 0.9401505602240896
key: train_precision
value: [0.98773006 0.98734177 1. 0.97382199 0.96808511 0.96354167
0.99285714 0.98351648 0.99333333 0.98709677]
mean value: 0.9837324329980541
key: test_recall
value: [0.72727273 0.77272727 0.36363636 0.81818182 0.86363636 0.90909091
0.68181818 0.86363636 0.72727273 0.68181818]
mean value: 0.740909090909091
key: train_recall
value: [0.81313131 0.78787879 0.55555556 0.93939394 0.91919192 0.93434343
0.7020202 0.9040404 0.75252525 0.77272727]
mean value: 0.8080808080808081
key: test_roc_auc
value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636
0.79545455 0.93181818 0.86363636 0.81818182]
mean value: 0.8431818181818181
key: train_roc_auc
value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495
0.84848485 0.94444444 0.87373737 0.88131313]
mean value: 0.8967171717171717
key: test_jcc
value: [0.69565217 0.77272727 0.36363636 0.72 0.82608696 0.76923077
0.625 0.86363636 0.72727273 0.65217391]
mean value: 0.7015416539981757
key: train_jcc
value: [0.805 0.78 0.55555556 0.91625616 0.89215686 0.90243902
0.69849246 0.89054726 0.74874372 0.765 ]
mean value: 0.7954191044912481
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.23656082 0.17917037 0.17021155 0.16980386 0.17501688 0.16724229
0.17781687 0.22106504 0.16424084 0.16352606]
mean value: 0.18246545791625976
key: score_time
value: [0.02395177 0.01514435 0.01610875 0.01673985 0.01671529 0.01640296
0.02107787 0.01673555 0.01646042 0.01520896]
mean value: 0.01745457649230957
key: test_mcc
value: [1. 0.86452993 0.77352678 0.77352678 0.95553309 0.86452993
0.90909091 0.95553309 0.95553309 0.95553309]
mean value: 0.9007336690106234
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93181818 0.88636364 0.88636364 0.97727273 0.93181818
0.95454545 0.97727273 0.97727273 0.97727273]
mean value: 0.95
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93023256 0.88372093 0.88888889 0.97674419 0.93333333
0.95454545 0.97674419 0.97674419 0.97777778]
mean value: 0.9498731501057083
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.95238095 0.9047619 0.86956522 1. 0.91304348
0.95454545 1. 1. 0.95652174]
mean value: 0.955081874647092
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.86363636 0.90909091 0.95454545 0.95454545
0.95454545 0.95454545 0.95454545 1. ]
mean value: 0.9454545454545454
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93181818 0.88636364 0.88636364 0.97727273 0.93181818
0.95454545 0.97727273 0.97727273 0.97727273]
mean value: 0.9500000000000001
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.86956522 0.79166667 0.8 0.95454545 0.875
0.91304348 0.95454545 0.95454545 0.95652174]
mean value: 0.9069433465085639
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05210972 0.05436349 0.05617356 0.05209541 0.07101512 0.03992128
0.05424643 0.06785297 0.05630231 0.06146455]
mean value: 0.05655448436737061
key: score_time
value: [0.01932168 0.02250862 0.01773334 0.02535009 0.02827978 0.01752615
0.0307796 0.03320765 0.02754951 0.02019429]
mean value: 0.024245071411132812
key: test_mcc
value: [1. 0.86452993 0.86452993 0.86452993 0.95553309 0.82158384
0.90909091 0.91287093 0.95553309 0.95553309]
mean value: 0.9103734736843416
key: train_mcc
value: [0.98496155 1. 0.98994949 1. 0.98994949 0.99496218
1. 0.99496218 1. 0.96974644]
mean value: 0.9924531348618916
key: test_accuracy
value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
0.95454545 0.95454545 0.97727273 0.97727273]
mean value: 0.9545454545454546
key: train_accuracy
value: [0.99242424 1. 0.99494949 1. 0.99494949 0.99747475
1. 0.99747475 1. 0.98484848]
mean value: 0.9962121212121212
key: test_fscore
value: [1. 0.93023256 0.93333333 0.93333333 0.97777778 0.91304348
0.95454545 0.95238095 0.97674419 0.97777778]
mean value: 0.9549168851595545
key: train_fscore
value: [0.99236641 1. 0.99497487 1. 0.99492386 0.99746835
1. 0.99746835 1. 0.98477157]
mean value: 0.996197342691844
key: test_precision
value: [1. 0.95238095 0.91304348 0.91304348 0.95652174 0.875
0.95454545 1. 1. 0.95652174]
mean value: 0.9521056841709016
key: train_precision
value: [1. 1. 0.99 1. 1. 1.
1. 1. 1. 0.98979592]
mean value: 0.9979795918367347
key: test_recall
value: [1. 0.90909091 0.95454545 0.95454545 1. 0.95454545
0.95454545 0.90909091 0.95454545 1. ]
mean value: 0.9590909090909091
key: train_recall
value: [0.98484848 1. 1. 1. 0.98989899 0.99494949
1. 0.99494949 1. 0.97979798]
mean value: 0.9944444444444445
key: test_roc_auc
value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
0.95454545 0.95454545 0.97727273 0.97727273]
mean value: 0.9545454545454546
key: train_roc_auc
value: [0.99242424 1. 0.99494949 1. 0.99494949 0.99747475
1. 0.99747475 1. 0.98484848]
mean value: 0.9962121212121212
key: test_jcc
value: [1. 0.86956522 0.875 0.875 0.95652174 0.84
0.91304348 0.90909091 0.95454545 0.95652174]
mean value: 0.9149288537549407
key: train_jcc
value: [0.98484848 1. 0.99 1. 0.98989899 0.99494949
1. 0.99494949 1. 0.97 ]
mean value: 0.9924646464646465
MCC on Blind test: 0.96
Accuracy on Blind test: 0.98
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.07493091 0.11247015 0.18619466 0.14949703 0.53921151 0.20738435
0.13005686 0.14361334 0.1468308 0.12836099]
mean value: 0.18185505867004395
key: score_time
value: [0.01473236 0.01457143 0.02343249 0.02896667 0.05819082 0.03038406
0.02774858 0.02561402 0.01513457 0.02963424]
mean value: 0.026840925216674805
key: test_mcc
value: [0.83205029 0.60678804 0.63636364 0.45643546 0.45454545 0.45643546
0.77352678 0.72727273 0.63900965 0.59152048]
mean value: 0.617394799410964
key: train_mcc
value: [0.98496155 0.98496155 0.98496155 0.98496155 0.99496218 0.98994949
0.98994949 0.98496155 0.98994949 0.98994949]
mean value: 0.9879567906059172
key: test_accuracy
value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273
0.88636364 0.86363636 0.81818182 0.79545455]
mean value: 0.8068181818181819
key: train_accuracy
value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949
0.99494949 0.99242424 0.99494949 0.99494949]
mean value: 0.9939393939393939
key: test_fscore
value: [0.91666667 0.76923077 0.81818182 0.71428571 0.72727273 0.71428571
0.88888889 0.86363636 0.80952381 0.79069767]
mean value: 0.8012670146391077
key: train_fscore
value: [0.99236641 0.99236641 0.99236641 0.99236641 0.99746835 0.99492386
0.99492386 0.99236641 0.99492386 0.99492386]
mean value: 0.9938995846971164
key: test_precision
value: [0.84615385 0.88235294 0.81818182 0.75 0.72727273 0.75
0.86956522 0.86363636 0.85 0.80952381]
mean value: 0.8166686723336339
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.68181818 0.81818182 0.68181818 0.72727273 0.68181818
0.90909091 0.86363636 0.77272727 0.77272727]
mean value: 0.7909090909090909
key: train_recall
value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899
0.98989899 0.98484848 0.98989899 0.98989899]
mean value: 0.9878787878787879
key: test_roc_auc
value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273
0.88636364 0.86363636 0.81818182 0.79545455]
mean value: 0.8068181818181818
key: train_roc_auc
value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949
0.99494949 0.99242424 0.99494949 0.99494949]
mean value: 0.993939393939394
key: test_jcc
value: [0.84615385 0.625 0.69230769 0.55555556 0.57142857 0.55555556
0.8 0.76 0.68 0.65384615]
mean value: 0.6739847374847375
key: train_jcc
value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899
0.98989899 0.98484848 0.98989899 0.98989899]
mean value: 0.9878787878787879
MCC on Blind test: 0.59
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.72530508 0.64669752 0.66160345 0.69841456 0.70706224 0.64407086
0.68081927 0.73172593 0.72187114 0.73497915]
mean value: 0.695254921913147
key: score_time
value: [0.00961423 0.0095017 0.0108285 0.01087284 0.0114634 0.00980496
0.01087284 0.01086545 0.01108456 0.01084566]
mean value: 0.010575413703918457
key: test_mcc
value: [1. 0.86452993 0.86452993 0.81818182 0.95553309 0.82158384
0.90909091 1. 0.95553309 0.91287093]
mean value: 0.9101853534252073
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93181818 0.93181818 0.90909091 0.97727273 0.90909091
0.95454545 1. 0.97727273 0.95454545]
mean value: 0.9545454545454546
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93023256 0.93333333 0.90909091 0.97674419 0.91304348
0.95454545 1. 0.97674419 0.95652174]
mean value: 0.9550255844593559
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.95238095 0.91304348 0.90909091 1. 0.875
0.95454545 1. 1. 0.91666667]
mean value: 0.9520727460944852
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.95454545 0.90909091 0.95454545 0.95454545
0.95454545 1. 0.95454545 1. ]
mean value: 0.9590909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93181818 0.93181818 0.90909091 0.97727273 0.90909091
0.95454545 1. 0.97727273 0.95454545]
mean value: 0.9545454545454546
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.86956522 0.875 0.83333333 0.95454545 0.84
0.91304348 1. 0.95454545 0.91666667]
mean value: 0.9156699604743083
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.06525111 0.04378963 0.03224754 0.06311655 0.06546879 0.1236515
0.13866115 0.07320833 0.10163593 0.10575056]
mean value: 0.08127810955047607
key: score_time
value: [0.02526879 0.01164603 0.01404119 0.01201868 0.01778102 0.02148128
0.01306319 0.03238225 0.0200932 0.01537132]
mean value: 0.018314695358276366
key: test_mcc
value: [ 0.41294832 0.30618622 0.40951418 0.59152048 0.2773501 0.37796447
0.29277002 -0.05634362 0.33562431 0.22750788]
mean value: 0.3175042364901061
key: train_mcc
value: [0.97485938 0.6751906 0.97984797 0.97984797 0.84241805 0.8407714
0.66332496 0.73135745 0.9459053 0.97984797]
mean value: 0.8613371053146376
key: test_accuracy
value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818
0.63636364 0.47727273 0.65909091 0.61363636]
mean value: 0.6545454545454545
key: train_accuracy
value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141
0.80555556 0.84848485 0.97222222 0.98989899]
mean value: 0.9227272727272727
key: test_fscore
value: [0.72340426 0.52941176 0.69767442 0.8 0.66666667 0.63157895
0.55555556 0.25806452 0.70588235 0.62222222]
mean value: 0.6190460699512758
key: train_fscore
value: [0.98746867 0.77018634 0.98994975 0.98994975 0.91008174 0.90607735
0.75862069 0.82142857 0.97297297 0.98994975]
mean value: 0.9096685579306306
key: test_precision
value: [0.68 0.75 0.71428571 0.7826087 0.61538462 0.75
0.71428571 0.44444444 0.62068966 0.60869565]
mean value: 0.6680394491398989
key: train_precision
value: [0.9800995 1. 0.985 0.985 0.98816568 1.
1. 1. 0.94736842 0.985 ]
mean value: 0.9870633604013567
key: test_recall
value: [0.77272727 0.40909091 0.68181818 0.81818182 0.72727273 0.54545455
0.45454545 0.18181818 0.81818182 0.63636364]
mean value: 0.6045454545454545
key: train_recall
value: [0.99494949 0.62626263 0.99494949 0.99494949 0.84343434 0.82828283
0.61111111 0.6969697 1. 0.99494949]
mean value: 0.8585858585858586
key: test_roc_auc
value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818
0.63636364 0.47727273 0.65909091 0.61363636]
mean value: 0.6545454545454545
key: train_roc_auc
value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141
0.80555556 0.84848485 0.97222222 0.98989899]
mean value: 0.9227272727272727
key: test_jcc
value: [0.56666667 0.36 0.53571429 0.66666667 0.5 0.46153846
0.38461538 0.14814815 0.54545455 0.4516129 ]
mean value: 0.4620417062029965
key: train_jcc
value: [0.97524752 0.62626263 0.9800995 0.9800995 0.835 0.82828283
0.61111111 0.6969697 0.94736842 0.9800995 ]
mean value: 0.8460540715894056
MCC on Blind test: 0.52
Accuracy on Blind test: 0.76
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0181272 0.01742435 0.03097105 0.05920911 0.03365254 0.03182578
0.04450655 0.05021977 0.03421092 0.03448963]
mean value: 0.035463690757751465
key: score_time
value: [0.01237655 0.01228476 0.01283073 0.03200722 0.0200367 0.01936555
0.03299618 0.03163648 0.03088164 0.03133059]
mean value: 0.023574638366699218
key: test_mcc
value: [0.81818182 0.77352678 0.77352678 0.68252363 0.86452993 0.73029674
0.77352678 0.95553309 0.81818182 0.91287093]
mean value: 0.810269831392801
key: train_mcc
value: [0.87374852 0.8693968 0.86391186 0.86873119 0.85876112 0.89404202
0.86886419 0.86373551 0.87896726 0.85363334]
mean value: 0.8693791805866737
key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636
0.88636364 0.97727273 0.90909091 0.95454545]
mean value: 0.9045454545454545
key: train_accuracy
value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697
0.93434343 0.93181818 0.93939394 0.92676768]
mean value: 0.9345959595959596
key: test_fscore
value: [0.90909091 0.88372093 0.88888889 0.84444444 0.93023256 0.86956522
0.88372093 0.97674419 0.90909091 0.95652174]
mean value: 0.9052020712688054
key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
value: [0.93670886 0.93564356 0.93266833 0.93467337 0.93 0.94656489
0.935 0.93233083 0.94 0.9273183 ]
mean value: 0.9350908129430359
key: test_precision
value: [0.90909091 0.9047619 0.86956522 0.82608696 0.95238095 0.83333333
0.9047619 1. 0.90909091 0.91666667]
mean value: 0.9025738753999624
key: train_precision
value: [0.93908629 0.91747573 0.92118227 0.93 0.92079208 0.95384615
0.92574257 0.92537313 0.93069307 0.92039801]
mean value: 0.9284589309478474
key: test_recall
value: [0.90909091 0.86363636 0.90909091 0.86363636 0.90909091 0.90909091
0.86363636 0.95454545 0.90909091 1. ]
mean value: 0.9090909090909091
key: train_recall
value: [0.93434343 0.95454545 0.94444444 0.93939394 0.93939394 0.93939394
0.94444444 0.93939394 0.94949495 0.93434343]
mean value: 0.9419191919191919
key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636
0.88636364 0.97727273 0.90909091 0.95454545]
mean value: 0.9045454545454545
key: train_roc_auc
value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697
0.93434343 0.93181818 0.93939394 0.92676768]
mean value: 0.9345959595959596
key: test_jcc
value: [0.83333333 0.79166667 0.8 0.73076923 0.86956522 0.76923077
0.79166667 0.95454545 0.83333333 0.91666667]
mean value: 0.8290777338603426
key: train_jcc
value: [0.88095238 0.87906977 0.87383178 0.87735849 0.86915888 0.89855072
0.87793427 0.87323944 0.88679245 0.86448598]
mean value: 0.8781374160862355
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.40553069 0.27984118 0.32102966 0.60245919 0.34661484 0.62287879
0.34234953 0.59961581 0.49291015 0.4358778 ]
mean value: 0.44491076469421387
key: score_time
value: [0.01224136 0.02761769 0.02338171 0.01102257 0.03149271 0.0273664
0.01604629 0.02581143 0.03535819 0.01974392]
mean value: 0.023008227348327637
key: test_mcc
value: [0.81818182 0.77352678 0.77352678 0.64715023 0.86452993 0.73029674
0.77352678 0.95553309 0.81818182 0.91287093]
mean value: 0.8067324910067508
key: train_mcc
value: [0.87374852 0.8693968 0.86391186 0.82332683 0.85876112 0.89404202
0.86886419 0.86373551 0.87896726 0.85363334]
mean value: 0.8648387451125236
key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636
0.88636364 0.97727273 0.90909091 0.95454545]
mean value: 0.9022727272727272
key: train_accuracy
value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697
0.93434343 0.93181818 0.93939394 0.92676768]
mean value: 0.9323232323232323
key: test_fscore
value: [0.90909091 0.88372093 0.88888889 0.83333333 0.93023256 0.86956522
0.88372093 0.97674419 0.90909091 0.95652174]
mean value: 0.9040909601576942
key: train_fscore
value: [0.93670886 0.93564356 0.93266833 0.91094148 0.93 0.94656489
0.935 0.93233083 0.94 0.9273183 ]
mean value: 0.9327176238423159
key: test_precision
value: [0.90909091 0.9047619 0.86956522 0.76923077 0.95238095 0.83333333
0.9047619 1. 0.90909091 0.91666667]
mean value: 0.8968882566708654
key: train_precision
value: [0.93908629 0.91747573 0.92118227 0.91794872 0.92079208 0.95384615
0.92574257 0.92537313 0.93069307 0.92039801]
mean value: 0.9272538027427192
key: test_recall
value: [0.90909091 0.86363636 0.90909091 0.90909091 0.90909091 0.90909091
0.86363636 0.95454545 0.90909091 1. ]
mean value: 0.9136363636363636
key: train_recall
value: [0.93434343 0.95454545 0.94444444 0.9040404 0.93939394 0.93939394
0.94444444 0.93939394 0.94949495 0.93434343]
mean value: 0.9383838383838384
key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636
0.88636364 0.97727273 0.90909091 0.95454545]
mean value: 0.9022727272727273
key: train_roc_auc
value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697
0.93434343 0.93181818 0.93939394 0.92676768]
mean value: 0.9323232323232323
key: test_jcc
value: [0.83333333 0.79166667 0.8 0.71428571 0.86956522 0.76923077
0.79166667 0.95454545 0.83333333 0.91666667]
mean value: 0.8274293822119909
key: train_jcc
value: [0.88095238 0.87906977 0.87383178 0.8364486 0.86915888 0.89855072
0.87793427 0.87323944 0.88679245 0.86448598]
mean value: 0.8740464268427159
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.17021847 0.0488503 0.04700518 0.10988641 0.11036181 0.18732977
0.1769886 0.13736916 0.13100767 0.08153677]
mean value: 0.12005541324615479
key: score_time
value: [0.01927829 0.01233315 0.01237607 0.02285433 0.01247287 0.02293801
0.01881552 0.01801705 0.01239252 0.02268243]
mean value: 0.01741602420806885
key: test_mcc
value: [0.91452919 0.91106719 1. 0.95652174 0.77865613 0.82506438
0.68911026 0.64426877 0.74410286 0.68972332]
mean value: 0.8153043839779892
key: train_mcc
value: [0.862096 0.86750864 0.84716163 0.85188889 0.87164354 0.85185095
0.85680144 0.88165855 0.87664317 0.86676585]
mean value: 0.8634018662617092
key: test_accuracy
value: [0.95555556 0.95555556 1. 0.97777778 0.88888889 0.91111111
0.84444444 0.82222222 0.86666667 0.84444444]
mean value: 0.9066666666666666
key: train_accuracy
value: [0.9308642 0.93333333 0.92345679 0.92592593 0.93580247 0.92592593
0.92839506 0.94074074 0.9382716 0.93333333]
mean value: 0.931604938271605
key: test_fscore
value: [0.95238095 0.95454545 1. 0.97777778 0.88888889 0.91666667
0.85106383 0.82608696 0.88 0.84444444]
mean value: 0.9091854971013158
key: train_fscore
value: [0.93203883 0.93493976 0.92457421 0.92647059 0.93627451 0.92574257
0.92839506 0.94117647 0.93857494 0.93366093]
mean value: 0.9321847880082487
key: test_precision
value: [1. 0.95454545 1. 0.95652174 0.86956522 0.88
0.83333333 0.82608696 0.81481481 0.86363636]
mean value: 0.8998503879373445
key: train_precision
value: [0.91866029 0.91509434 0.91346154 0.92195122 0.93170732 0.92574257
0.92610837 0.93203883 0.93170732 0.92682927]
mean value: 0.9243301070709858
key: test_recall
value: [0.90909091 0.95454545 1. 1. 0.90909091 0.95652174
0.86956522 0.82608696 0.95652174 0.82608696]
mean value: 0.9207509881422925
key: train_recall
value: [0.94581281 0.95566502 0.93596059 0.93103448 0.9408867 0.92574257
0.93069307 0.95049505 0.94554455 0.94059406]
mean value: 0.9402428912842022
key: test_roc_auc
value: [0.95454545 0.9555336 1. 0.97826087 0.88932806 0.91007905
0.84387352 0.82213439 0.86462451 0.84486166]
mean value: 0.9063241106719367
key: train_roc_auc
value: [0.9308272 0.93327806 0.92342584 0.92591328 0.93578988 0.92592547
0.92840072 0.94076477 0.93828952 0.93335122]
mean value: 0.9315965956201532
key: test_jcc
value: [0.90909091 0.91304348 1. 0.95652174 0.8 0.84615385
0.74074074 0.7037037 0.78571429 0.73076923]
mean value: 0.838573793356402
key: train_jcc
value: [0.87272727 0.87782805 0.85972851 0.8630137 0.88018433 0.86175115
0.86635945 0.88888889 0.88425926 0.87557604]
mean value: 0.8730316648333466
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [2.99451971 2.03133249 2.58804703 2.36343884 2.9534328 3.19177508
2.325104 2.89092827 3.53216934 2.62339878]
mean value: 2.74941463470459
key: score_time
value: [0.02300715 0.01974368 0.01187634 0.01202106 0.01932669 0.02432084
0.02360487 0.04886913 0.01317501 0.03754377]
mean value: 0.02334885597229004
key: test_mcc
value: [0.91452919 0.86732843 1. 0.95652174 0.77865613 0.82506438
0.73559956 0.68972332 0.74410286 0.77865613]
mean value: 0.8290181733629296
key: train_mcc
value: [0.89656272 0.89152603 0.81736586 0.87655164 0.90127552 0.88152087
0.90618446 0.95556639 0.89140349 0.89639783]
mean value: 0.8914354807454282
key: test_accuracy
value: [0.95555556 0.93333333 1. 0.97777778 0.88888889 0.91111111
0.86666667 0.84444444 0.86666667 0.88888889]
mean value: 0.9133333333333333
key: train_accuracy
value: [0.94814815 0.94567901 0.90864198 0.9382716 0.95061728 0.94074074
0.95308642 0.97777778 0.94567901 0.94814815]
mean value: 0.945679012345679
key: test_fscore
value: [0.95238095 0.93023256 1. 0.97777778 0.88888889 0.91666667
0.875 0.84444444 0.88 0.88888889]
mean value: 0.9154280177187154
key: train_fscore
value: [0.94890511 0.94634146 0.90953545 0.93857494 0.95098039 0.94029851
0.95308642 0.97766749 0.94581281 0.94840295]
mean value: 0.9459605533255245
key: test_precision
value: [1. 0.95238095 1. 0.95652174 0.86956522 0.88
0.84 0.86363636 0.81481481 0.90909091]
mean value: 0.9086009996444779
key: train_precision
value: [0.9375 0.93719807 0.90291262 0.93627451 0.94634146 0.945
0.95073892 0.9800995 0.94117647 0.94146341]
mean value: 0.9418704966176731
key: test_recall
value: [0.90909091 0.90909091 1. 1. 0.90909091 0.95652174
0.91304348 0.82608696 0.95652174 0.86956522]
mean value: 0.924901185770751
key: train_recall
value: [0.96059113 0.95566502 0.91625616 0.9408867 0.95566502 0.93564356
0.95544554 0.97524752 0.95049505 0.95544554]
mean value: 0.9501341267131639
key: test_roc_auc
value: [0.95454545 0.93280632 1. 0.97826087 0.88932806 0.91007905
0.86561265 0.84486166 0.86462451 0.88932806]
mean value: 0.9129446640316206
key: train_roc_auc
value: [0.94811735 0.94565429 0.90862313 0.93826513 0.95060479 0.94072819
0.95309223 0.97777155 0.94569087 0.94816612]
mean value: 0.9456713651660732
key: test_jcc
value: [0.90909091 0.86956522 1. 0.95652174 0.8 0.84615385
0.77777778 0.73076923 0.78571429 0.8 ]
mean value: 0.8475593006027788
key: train_jcc
value: [0.90277778 0.89814815 0.83408072 0.88425926 0.90654206 0.88732394
0.91037736 0.95631068 0.89719626 0.90186916]
mean value: 0.8978885361073676
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01388836 0.01219487 0.01202846 0.01193786 0.01194143 0.01207137
0.01265931 0.01293683 0.01277041 0.01270103]
mean value: 0.012512993812561036
key: score_time
value: [0.01055479 0.01051068 0.01049137 0.01057005 0.01054406 0.01070571
0.01106882 0.01103139 0.01090407 0.01119542]
mean value: 0.010757637023925782
key: test_mcc
value: [0.73320158 0.70501339 0.55533597 0.72299881 0.62869461 0.77821935
0.3860278 0.60637261 0.64426877 0.60637261]
mean value: 0.6366505501355761
key: train_mcc
value: [0.69394577 0.68810424 0.64177606 0.65988684 0.66444098 0.69047787
0.69047787 0.69787618 0.68793807 0.68334493]
mean value: 0.6798268830360812
key: test_accuracy
value: [0.86666667 0.84444444 0.77777778 0.84444444 0.8 0.88888889
0.68888889 0.8 0.82222222 0.8 ]
mean value: 0.8133333333333334
key: train_accuracy
value: [0.84691358 0.84197531 0.81728395 0.82716049 0.82962963 0.84197531
0.84197531 0.84691358 0.84197531 0.83950617]
mean value: 0.8375308641975309
key: test_fscore
value: [0.86363636 0.82051282 0.77272727 0.81081081 0.75675676 0.89361702
0.66666667 0.79069767 0.82608696 0.79069767]
mean value: 0.7992210017746235
key: train_fscore
value: [0.84878049 0.83333333 0.80319149 0.81578947 0.81889764 0.82978723
0.82978723 0.83769634 0.83246073 0.82939633]
mean value: 0.8279120283586651
key: test_precision
value: [0.86363636 0.94117647 0.77272727 1. 0.93333333 0.875
0.73684211 0.85 0.82608696 0.85 ]
mean value: 0.8648802502070102
key: train_precision
value: [0.84057971 0.8839779 0.87283237 0.87570621 0.87640449 0.89655172
0.89655172 0.88888889 0.88333333 0.88268156]
mean value: 0.8797507924454793
key: test_recall
value: [0.86363636 0.72727273 0.77272727 0.68181818 0.63636364 0.91304348
0.60869565 0.73913043 0.82608696 0.73913043]
mean value: 0.7507905138339921
key: train_recall
value: [0.85714286 0.78817734 0.74384236 0.7635468 0.76847291 0.77227723
0.77227723 0.79207921 0.78712871 0.78217822]
mean value: 0.7827122860069258
key: test_roc_auc
value: [0.86660079 0.84189723 0.77766798 0.84090909 0.79644269 0.88833992
0.69071146 0.8013834 0.82213439 0.8013834 ]
mean value: 0.8127470355731226
key: train_roc_auc
value: [0.84688826 0.84210847 0.81746574 0.82731795 0.82978101 0.84180364
0.84180364 0.84677852 0.84184022 0.83936497]
mean value: 0.8375152416719505
key: test_jcc
value: [0.76 0.69565217 0.62962963 0.68181818 0.60869565 0.80769231
0.5 0.65384615 0.7037037 0.65384615]
mean value: 0.6694883956623087
key: train_jcc
value: [0.73728814 0.71428571 0.67111111 0.68888889 0.69333333 0.70909091
0.70909091 0.72072072 0.71300448 0.70852018]
mean value: 0.7065334385791937
MCC on Blind test: 0.68
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01217747 0.01220322 0.01207113 0.01213813 0.01209164 0.01210999
0.01217937 0.01207447 0.01208973 0.01212859]
mean value: 0.01212637424468994
key: score_time
value: [0.01052928 0.01042199 0.01046658 0.01045585 0.01039958 0.01040077
0.01052475 0.01056433 0.01042843 0.01073074]
mean value: 0.010492229461669922
key: test_mcc
value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078 0.64613475
0.68911026 0.51089209 0.73320158 0.51185771]
mean value: 0.6728018722919369
key: train_mcc
value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345
0.73363435 0.73363435 0.718529 0.75311563]
mean value: 0.7285944480486638
key: test_accuracy
value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222
0.84444444 0.75555556 0.86666667 0.75555556]
mean value: 0.8355555555555555
key: train_accuracy
value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284
0.86666667 0.86666667 0.85925926 0.87654321]
mean value: 0.8641975308641976
key: test_fscore
value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333
0.85106383 0.76595745 0.86956522 0.75555556]
mean value: 0.8371429662120305
key: train_fscore
value: [0.85572139 0.855 0.85 0.86486486 0.8817734 0.85858586
0.86432161 0.86432161 0.85925926 0.87562189]
mean value: 0.8629469881387253
key: test_precision
value: [0.85 0.94736842 0.90909091 0.91304348 0.72 0.8
0.83333333 0.75 0.86956522 0.77272727]
mean value: 0.836512863185632
key: train_precision
value: [0.86432161 0.8680203 0.86294416 0.8627451 0.8817734 0.87628866
0.87755102 0.87755102 0.85714286 0.88 ]
mean value: 0.8708338129852269
key: test_recall
value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522
0.86956522 0.7826087 0.86956522 0.73913043]
mean value: 0.8403162055335969
key: train_recall
value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734 0.84158416
0.85148515 0.85148515 0.86138614 0.87128713]
mean value: 0.8553089791737795
key: test_roc_auc
value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917 0.82114625
0.84387352 0.75494071 0.86660079 0.75592885]
mean value: 0.8352766798418972
key: train_roc_auc
value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878
0.86662927 0.86662927 0.8592645 0.87653026]
mean value: 0.8641930449202556
key: test_jcc
value: [0.68 0.7826087 0.83333333 0.875 0.62068966 0.71428571
0.74074074 0.62068966 0.76923077 0.60714286]
mean value: 0.7243721420730417
key: train_jcc
value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239
0.76106195 0.76106195 0.75324675 0.77876106]
mean value: 0.7590476528359691
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01150799 0.01243472 0.01137662 0.01154757 0.01141047 0.01107907
0.01314235 0.01315427 0.01413846 0.03310156]
mean value: 0.014289307594299316
key: score_time
value: [0.02538848 0.02856398 0.02112675 0.02574253 0.02009082 0.01918626
0.05908275 0.04989839 0.05723858 0.07202363]
mean value: 0.03783421516418457
key: test_mcc
value: [0.46640316 0.68972332 0.60079051 0.60000118 0.24655092 0.3860278
0.55666994 0.33399209 0.55533597 0.42744299]
mean value: 0.4862937886203532
key: train_mcc
value: [0.68398976 0.68932545 0.68482256 0.65994656 0.70428051 0.68422603
0.72399345 0.67997157 0.65931708 0.72375269]
mean value: 0.6893625650465149
key: test_accuracy
value: [0.73333333 0.84444444 0.8 0.8 0.62222222 0.68888889
0.77777778 0.66666667 0.77777778 0.71111111]
mean value: 0.7422222222222222
key: train_accuracy
value: [0.84197531 0.84444444 0.84197531 0.82962963 0.85185185 0.84197531
0.8617284 0.83950617 0.82962963 0.8617284 ]
mean value: 0.8444444444444444
key: test_fscore
value: [0.72727273 0.84444444 0.8 0.79069767 0.56410256 0.66666667
0.79166667 0.66666667 0.7826087 0.69767442]
mean value: 0.7331800524495166
key: train_fscore
value: [0.84158416 0.84210526 0.83838384 0.82619647 0.84924623 0.83919598
0.85858586 0.8346056 0.82793017 0.85929648]
mean value: 0.8417130058090375
key: test_precision
value: [0.72727273 0.82608696 0.7826087 0.80952381 0.64705882 0.73684211
0.76 0.68181818 0.7826087 0.75 ]
mean value: 0.7503819995233375
key: train_precision
value: [0.84577114 0.85714286 0.86010363 0.84536082 0.86666667 0.85204082
0.87628866 0.85863874 0.83417085 0.87244898]
mean value: 0.856863317321244
key: test_recall
value: [0.72727273 0.86363636 0.81818182 0.77272727 0.5 0.60869565
0.82608696 0.65217391 0.7826087 0.65217391]
mean value: 0.7203557312252965
key: train_recall
value: [0.83743842 0.82758621 0.81773399 0.80788177 0.83251232 0.82673267
0.84158416 0.81188119 0.82178218 0.84653465]
mean value: 0.8271667560844754
key: test_roc_auc
value: [0.73320158 0.84486166 0.80039526 0.79940711 0.61956522 0.69071146
0.77667984 0.66699605 0.77766798 0.71245059]
mean value: 0.742193675889328
key: train_roc_auc
value: [0.84198654 0.84448617 0.84203531 0.82968346 0.85189972 0.84193777
0.86167878 0.83943813 0.8296103 0.86169097]
mean value: 0.8444447154075013
key: test_jcc
value: [0.57142857 0.73076923 0.66666667 0.65384615 0.39285714 0.5
0.65517241 0.5 0.64285714 0.53571429]
mean value: 0.5849311607932297
key: train_jcc
value: [0.72649573 0.72727273 0.72173913 0.70386266 0.73799127 0.72294372
0.75221239 0.71615721 0.70638298 0.75330396]
mean value: 0.726836177256853
MCC on Blind test: 0.41
Accuracy on Blind test: 0.71
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.04429531 0.05560946 0.05552363 0.02630138 0.01901889 0.01773357
0.0175724 0.01781559 0.01794982 0.01811028]
mean value: 0.02899303436279297
key: score_time
value: [0.04254222 0.02727342 0.02701807 0.0157671 0.01172519 0.01093841
0.01104927 0.01114202 0.01128674 0.01135397]
mean value: 0.018009638786315917
key: test_mcc
value: [0.83484711 0.91452919 1. 0.91106719 0.73663511 0.82506438
0.68972332 0.64426877 0.78405645 0.64426877]
mean value: 0.7984460299700171
key: train_mcc
value: [0.7927359 0.80280601 0.78773172 0.78766004 0.80741843 0.80741843
0.81238873 0.81234453 0.81234453 0.81234453]
mean value: 0.8035192858986034
key: test_accuracy
value: [0.91111111 0.95555556 1. 0.95555556 0.86666667 0.91111111
0.84444444 0.82222222 0.88888889 0.82222222]
mean value: 0.8977777777777778
key: train_accuracy
value: [0.8962963 0.90123457 0.89382716 0.89382716 0.9037037 0.9037037
0.90617284 0.90617284 0.90617284 0.90617284]
mean value: 0.9017283950617284
key: test_fscore
value: [0.9 0.95238095 1. 0.95454545 0.86956522 0.91666667
0.84444444 0.82608696 0.89795918 0.82608696]
mean value: 0.898773583214577
key: train_fscore
value: [0.89756098 0.90291262 0.89486553 0.89434889 0.9037037 0.9037037
0.90640394 0.90594059 0.90594059 0.90594059]
mean value: 0.9021321147462571
key: test_precision
value: [1. 1. 1. 0.95454545 0.83333333 0.88
0.86363636 0.82608696 0.84615385 0.82608696]
mean value: 0.9029842910712476
key: train_precision
value: [0.88888889 0.88995215 0.88834951 0.89215686 0.90594059 0.90147783
0.90196078 0.90594059 0.90594059 0.90594059]
mean value: 0.8986548412370806
key: test_recall
value: [0.81818182 0.90909091 1. 0.95454545 0.90909091 0.95652174
0.82608696 0.82608696 0.95652174 0.82608696]
mean value: 0.8982213438735178
key: train_recall
value: [0.90640394 0.91625616 0.90147783 0.89655172 0.90147783 0.90594059
0.91089109 0.90594059 0.90594059 0.90594059]
mean value: 0.9056820953031264
key: test_roc_auc
value: [0.90909091 0.95454545 1. 0.9555336 0.86758893 0.91007905
0.84486166 0.82213439 0.88735178 0.82213439]
mean value: 0.8973320158102767
key: train_roc_auc
value: [0.89627128 0.90119739 0.89380822 0.89382042 0.90370921 0.90370921
0.90618446 0.90617227 0.90617227 0.90617227]
mean value: 0.9017216992635224
key: test_jcc
value: [0.81818182 0.90909091 1. 0.91304348 0.76923077 0.84615385
0.73076923 0.7037037 0.81481481 0.7037037 ]
mean value: 0.8208692273909666
key: train_jcc
value: [0.81415929 0.82300885 0.80973451 0.80888889 0.82432432 0.82432432
0.82882883 0.8280543 0.8280543 0.8280543 ]
mean value: 0.8217431917161224
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.16594219 1.53831148 0.54525065 0.66455746 0.58896971 1.97863793
1.58019543 2.03711033 1.07067323 0.89870596]
mean value: 1.206835436820984
key: score_time
value: [0.02138186 0.01350737 0.01318359 0.01283979 0.02119589 0.02027488
0.02557015 0.01267719 0.0126822 0.0126617 ]
mean value: 0.016597461700439454
key: test_mcc
value: [0.83484711 0.91106719 0.95652174 0.95652174 0.68911026 0.86732843
0.60079051 0.68911026 0.61657545 0.56604076]
mean value: 0.7687913462185909
key: train_mcc
value: [0.81736586 0.86188899 0.80263415 0.81331421 0.81271657 0.80904514
0.83406549 0.86381736 0.84818518 0.86717283]
mean value: 0.8330205780457784
key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 0.97777778 0.84444444 0.93333333
0.8 0.84444444 0.77777778 0.77777778]
mean value: 0.88
key: train_accuracy
value: [0.90864198 0.9308642 0.90123457 0.90617284 0.9037037 0.9037037
0.91604938 0.9308642 0.92345679 0.93333333]
mean value: 0.9158024691358024
key: test_fscore
value: [0.9 0.95454545 0.97777778 0.97777778 0.8372093 0.93617021
0.8 0.85106383 0.82142857 0.76190476]
mean value: 0.8817877688313116
key: train_fscore
value: [0.90953545 0.93170732 0.90049751 0.90865385 0.89817232 0.90025575
0.91282051 0.93301435 0.9253012 0.93198992]
mean value: 0.9151948202363086
key: test_precision
value: [1. 0.95454545 0.95652174 0.95652174 0.85714286 0.91666667
0.81818182 0.83333333 0.6969697 0.84210526]
mean value: 0.8831988568258591
key: train_precision
value: [0.90291262 0.92270531 0.90954774 0.88732394 0.95555556 0.93121693
0.94680851 0.90277778 0.90140845 0.94871795]
mean value: 0.9208974792335061
key: test_recall
value: [0.81818182 0.95454545 1. 1. 0.81818182 0.95652174
0.7826087 0.86956522 1. 0.69565217]
mean value: 0.8895256916996047
key: train_recall
value: [0.91625616 0.9408867 0.89162562 0.93103448 0.84729064 0.87128713
0.88118812 0.96534653 0.95049505 0.91584158]
mean value: 0.9111252011900697
key: test_roc_auc
value: [0.90909091 0.9555336 0.97826087 0.97826087 0.84387352 0.93280632
0.80039526 0.84387352 0.77272727 0.77964427]
mean value: 0.8794466403162056
key: train_roc_auc
value: [0.90862313 0.93083939 0.90125835 0.9061113 0.90384334 0.90362386
0.91596352 0.93094913 0.92352339 0.93329025]
mean value: 0.9158025654782227
key: test_jcc
value: [0.81818182 0.91304348 0.95652174 0.95652174 0.72 0.88
0.66666667 0.74074074 0.6969697 0.61538462]
mean value: 0.7964030494465277
key: train_jcc
value: [0.83408072 0.87214612 0.81900452 0.83259912 0.81516588 0.81860465
0.83962264 0.87443946 0.86098655 0.87264151]
mean value: 0.8439291167891907
MCC on Blind test: 0.75
Accuracy on Blind test: 0.88
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02708101 0.02197504 0.0223949 0.02293539 0.02220106 0.01959419
0.02083325 0.01980853 0.01847649 0.02072978]
mean value: 0.021602964401245116
key: score_time
value: [0.01235509 0.00963283 0.01003337 0.01024127 0.00966859 0.0089643
0.00935817 0.00911379 0.00966048 0.00908685]
mean value: 0.009811472892761231
key: test_mcc
value: [0.82506438 0.82213439 0.91106719 0.95643752 0.87476705 0.95643752
0.91106719 0.86732843 0.74605372 0.78530224]
mean value: 0.8655659627181006
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91111111 0.91111111 0.95555556 0.97777778 0.93333333 0.97777778
0.95555556 0.93333333 0.86666667 0.88888889]
mean value: 0.9311111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9047619 0.90909091 0.95454545 0.97674419 0.93617021 0.9787234
0.95652174 0.93617021 0.85714286 0.88372093]
mean value: 0.9293591810737865
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.90909091 0.95454545 1. 0.88 0.95833333
0.95652174 0.91666667 0.94736842 0.95 ]
mean value: 0.942252652381943
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86363636 0.90909091 0.95454545 0.95454545 1. 1.
0.95652174 0.95652174 0.7826087 0.82608696]
mean value: 0.9203557312252965
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91007905 0.91106719 0.9555336 0.97727273 0.93478261 0.97727273
0.9555336 0.93280632 0.86857708 0.89031621]
mean value: 0.9313241106719368
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82608696 0.83333333 0.91304348 0.95454545 0.88 0.95833333
0.91666667 0.88 0.75 0.79166667]
mean value: 0.8703675889328063
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11638784 0.12223196 0.12821341 0.12665248 0.12380767 0.12398219
0.12610722 0.1174705 0.12105989 0.12040401]
mean value: 0.12263171672821045
key: score_time
value: [0.01924944 0.01850915 0.01916957 0.01890802 0.01916337 0.01787496
0.01930618 0.0202477 0.01806045 0.01934838]
mean value: 0.01898372173309326
key: test_mcc
value: [0.86732843 0.95643752 0.91106719 0.95652174 0.73320158 0.73559956
0.86732843 0.68972332 0.83484711 0.55841694]
mean value: 0.8110471829947312
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.97777778 0.95555556 0.97777778 0.86666667 0.86666667
0.93333333 0.84444444 0.91111111 0.77777778]
mean value: 0.9044444444444445
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93023256 0.97674419 0.95454545 0.97777778 0.86363636 0.875
0.93617021 0.84444444 0.92 0.77272727]
mean value: 0.9051278270083317
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 1. 0.95454545 0.95652174 0.86363636 0.84
0.91666667 0.86363636 0.85185185 0.80952381]
mean value: 0.9008763201371897
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.95454545 0.95454545 1. 0.86363636 0.91304348
0.95652174 0.82608696 1. 0.73913043]
mean value: 0.9116600790513834
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93280632 0.97727273 0.9555336 0.97826087 0.86660079 0.86561265
0.93280632 0.84486166 0.90909091 0.77865613]
mean value: 0.9041501976284585
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86956522 0.95454545 0.91304348 0.95652174 0.76 0.77777778
0.88 0.73076923 0.85185185 0.62962963]
mean value: 0.8323704379356553
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01029634 0.01115966 0.01019788 0.01129127 0.01026201 0.0103786
0.01133752 0.01088929 0.01124072 0.01106381]
mean value: 0.010811710357666015
key: score_time
value: [0.00888944 0.00963187 0.00929451 0.00969863 0.0097034 0.00970888
0.00967789 0.0094142 0.00970745 0.00901723]
mean value: 0.009474349021911622
key: test_mcc
value: [0.52631666 0.60000118 0.77865613 0.60000118 0.24356483 0.64613475
0.33402405 0.43557241 0.77821935 0.51185771]
mean value: 0.5454348232923768
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75555556 0.8 0.88888889 0.8 0.62222222 0.82222222
0.66666667 0.71111111 0.88888889 0.75555556]
mean value: 0.7711111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7755102 0.79069767 0.88888889 0.79069767 0.60465116 0.83333333
0.69387755 0.68292683 0.89361702 0.75555556]
mean value: 0.7709755895052613
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7037037 0.80952381 0.86956522 0.80952381 0.61904762 0.8
0.65384615 0.77777778 0.875 0.77272727]
mean value: 0.769071536354145
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86363636 0.77272727 0.90909091 0.77272727 0.59090909 0.86956522
0.73913043 0.60869565 0.91304348 0.73913043]
mean value: 0.7778656126482213
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75790514 0.79940711 0.88932806 0.79940711 0.6215415 0.82114625
0.66501976 0.71343874 0.88833992 0.75592885]
mean value: 0.7711462450592885
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.63333333 0.65384615 0.8 0.65384615 0.43333333 0.71428571
0.53125 0.51851852 0.80769231 0.60714286]
mean value: 0.6353248371998372
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.55
Accuracy on Blind test: 0.77
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.70288324 4.30727792 2.46852684 2.61855292 2.55521131 2.43811393
2.36803675 1.75058436 2.72210979 2.69686818]
mean value: 2.6628165245056152
key: score_time
value: [0.25447559 0.19142795 0.16145873 0.14909005 0.13192844 0.12770748
0.09348536 0.12782025 0.173311 0.17790127]
mean value: 0.15886061191558837
key: test_mcc
value: [0.91452919 0.95643752 0.95652174 1. 0.86758893 0.91452919
0.82506438 0.82213439 0.95643752 0.82574419]
mean value: 0.9038987043172344
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.95555556
0.91111111 0.91111111 0.97777778 0.91111111]
mean value: 0.9511111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95238095 0.97674419 0.97777778 1. 0.93333333 0.95833333
0.91666667 0.91304348 0.9787234 0.90909091]
mean value: 0.9516094041145673
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.95652174 1. 0.91304348 0.92
0.88 0.91304348 0.95833333 0.95238095]
mean value: 0.949332298136646
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.95454545 1. 1. 0.95454545 1.
0.95652174 0.91304348 1. 0.86956522]
mean value: 0.9557312252964427
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.97727273 0.97826087 1. 0.93379447 0.95454545
0.91007905 0.91106719 0.97727273 0.91205534]
mean value: 0.9508893280632411
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90909091 0.95454545 0.95652174 1. 0.875 0.92
0.84615385 0.84 0.95833333 0.83333333]
mean value: 0.9092978615587312
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.89
Accuracy on Blind test: 0.95
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.58991671 1.86406493 2.03459716 1.74722981 2.07112956 2.12889957
1.83267713 1.98935246 1.70792747 2.10284638]
mean value: 1.9068641185760498
key: score_time
value: [0.21701336 0.21829891 0.22647643 0.18004084 0.16667223 0.17558956
0.19052815 0.21188164 0.21953034 0.15258312]
mean value: 0.19586145877838135
key: test_mcc
value: [0.91452919 0.95643752 0.95652174 1. 0.86758893 0.91452919
0.77821935 0.82213439 0.91452919 0.69583743]
mean value: 0.8820326916601581
key: train_mcc
value: [0.95556639 0.95061698 0.95061698 0.94569087 0.95556748 0.9457805
0.94568955 0.96053948 0.95066215 0.96049359]
mean value: 0.9521223971905592
key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.95555556
0.88888889 0.91111111 0.95555556 0.84444444]
mean value: 0.94
key: train_accuracy
value: [0.97777778 0.97530864 0.97530864 0.97283951 0.97777778 0.97283951
0.97283951 0.98024691 0.97530864 0.98024691]
mean value: 0.9760493827160494
key: test_fscore
value: [0.95238095 0.97674419 0.97777778 1. 0.93333333 0.95833333
0.89361702 0.91304348 0.95833333 0.8372093 ]
mean value: 0.9400772718068289
key: train_fscore
value: [0.97788698 0.97536946 0.97536946 0.97283951 0.97777778 0.97256858
0.97270471 0.9800995 0.97512438 0.98019802]
mean value: 0.9759938371686563
key: test_precision
value: [1. 1. 0.95652174 1. 0.91304348 0.92
0.875 0.91304348 0.92 0.9 ]
mean value: 0.9397608695652174
key: train_precision
value: [0.9754902 0.97536946 0.97536946 0.97524752 0.98019802 0.9798995
0.97512438 0.985 0.98 0.98019802]
mean value: 0.9781896552287914
key: test_recall
value: [0.90909091 0.95454545 1. 1. 0.95454545 1.
0.91304348 0.91304348 1. 0.7826087 ]
mean value: 0.9426877470355731
key: train_recall
value: [0.98029557 0.97536946 0.97536946 0.97044335 0.97536946 0.96534653
0.97029703 0.97524752 0.97029703 0.98019802]
mean value: 0.9738233429254255
key: test_roc_auc
value: [0.95454545 0.97727273 0.97826087 1. 0.93379447 0.95454545
0.88833992 0.91106719 0.95454545 0.8458498 ]
mean value: 0.9398221343873517
key: train_roc_auc
value: [0.97777155 0.97530849 0.97530849 0.97284544 0.97778374 0.97282105
0.97283324 0.9802346 0.9752963 0.98024679]
mean value: 0.9760449690289226
key: test_jcc
value: [0.90909091 0.95454545 0.95652174 1. 0.875 0.92
0.80769231 0.84 0.92 0.72 ]
mean value: 0.8902850410459107
key: train_jcc
value: [0.95673077 0.95192308 0.95192308 0.94711538 0.95652174 0.94660194
0.9468599 0.96097561 0.95145631 0.96116505]
mean value: 0.9531272860931357
MCC on Blind test: 0.88
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.03431368 0.02331161 0.01544642 0.01508474 0.01589894 0.01519084
0.01523089 0.02336693 0.02334094 0.02675939]
mean value: 0.020794439315795898
key: score_time
value: [0.02263975 0.015486 0.01317906 0.02044368 0.01301241 0.02103996
0.01305509 0.01366353 0.01381397 0.01300645]
mean value: 0.015933990478515625
key: test_mcc
value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078 0.64613475
0.68911026 0.51089209 0.73320158 0.51185771]
mean value: 0.6728018722919369
key: train_mcc
value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345
0.73363435 0.73363435 0.718529 0.75311563]
mean value: 0.7285944480486638
key: test_accuracy
value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222
0.84444444 0.75555556 0.86666667 0.75555556]
mean value: 0.8355555555555555
key: train_accuracy
value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284
0.86666667 0.86666667 0.85925926 0.87654321]
mean value: 0.8641975308641976
key: test_fscore
value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333
0.85106383 0.76595745 0.86956522 0.75555556]
mean value: 0.8371429662120305
key: train_fscore
value: [0.85572139 0.855 0.85 0.86486486 0.8817734 0.85858586
0.86432161 0.86432161 0.85925926 0.87562189]
mean value: 0.8629469881387253
key: test_precision
value: [0.85 0.94736842 0.90909091 0.91304348 0.72 0.8
0.83333333 0.75 0.86956522 0.77272727]
mean value: 0.836512863185632
key: train_precision
value: [0.86432161 0.8680203 0.86294416 0.8627451 0.8817734 0.87628866
0.87755102 0.87755102 0.85714286 0.88 ]
mean value: 0.8708338129852269
key: test_recall
value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522
0.86956522 0.7826087 0.86956522 0.73913043]
mean value: 0.8403162055335969
key: train_recall
value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734 0.84158416
0.85148515 0.85148515 0.86138614 0.87128713]
mean value: 0.8553089791737795
key: test_roc_auc
value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917 0.82114625
0.84387352 0.75494071 0.86660079 0.75592885]
mean value: 0.8352766798418972
key: train_roc_auc
value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878
0.86662927 0.86662927 0.8592645 0.87653026]
mean value: 0.8641930449202556
key: test_jcc
value: [0.68 0.7826087 0.83333333 0.875 0.62068966 0.71428571
0.74074074 0.62068966 0.76923077 0.60714286]
mean value: 0.7243721420730417
key: train_jcc
value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239
0.76106195 0.76106195 0.75324675 0.77876106]
mean value: 0.7590476528359691
MCC on Blind test: 0.73
Accuracy on Blind test: 0.87
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [4.27274823 1.59566879 1.63563442 1.6189487 1.60230374 1.59007549
1.5283258 1.495116 1.52452278 1.55737209]
mean value: 1.8420716047286987
key: score_time
value: [0.01275396 0.01314974 0.0131228 0.01300788 0.01266623 0.01288438
0.01313043 0.01349545 0.01266217 0.01410031]
mean value: 0.013097333908081054
key: test_mcc
value: [0.87406293 0.95643752 1. 0.95643752 0.91485328 0.95643752
0.86732843 0.82213439 0.95643752 0.77865613]
mean value: 0.908278523558777
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93333333 0.97777778 1. 0.97777778 0.95555556 0.97777778
0.93333333 0.91111111 0.97777778 0.88888889]
mean value: 0.9533333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92682927 0.97674419 1. 0.97674419 0.95652174 0.9787234
0.93617021 0.91304348 0.9787234 0.88888889]
mean value: 0.9532388767942495
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 0.91666667 0.95833333
0.91666667 0.91304348 0.95833333 0.90909091]
mean value: 0.9572134387351778
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86363636 0.95454545 1. 0.95454545 1. 1.
0.95652174 0.91304348 1. 0.86956522]
mean value: 0.9511857707509881
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93181818 0.97727273 1. 0.97727273 0.95652174 0.97727273
0.93280632 0.91106719 0.97727273 0.88932806]
mean value: 0.9530632411067194
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86363636 0.95454545 1. 0.95454545 0.91666667 0.95833333
0.88 0.84 0.95833333 0.8 ]
mean value: 0.9126060606060606
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.95
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05823278 0.07864523 0.08365369 0.06540322 0.09378934 0.07705927
0.10100937 0.11681724 0.09124517 0.08683014]
mean value: 0.08526854515075684
key: score_time
value: [0.02650523 0.02493405 0.01329923 0.02620196 0.03271937 0.02158356
0.02505922 0.02166438 0.02495146 0.02136326]
mean value: 0.02382817268371582
key: test_mcc
value: [0.82506438 0.73320158 0.91106719 0.86732843 0.73663511 0.77821935
0.68911026 0.73663511 0.55666994 0.64426877]
mean value: 0.7478200124484804
key: train_mcc
value: [0.93608359 0.91614635 0.90123397 0.91614635 0.9062683 0.92620337
0.92593586 0.91606106 0.91115718 0.91615248]
mean value: 0.9171388520151313
key: test_accuracy
value: [0.91111111 0.86666667 0.95555556 0.93333333 0.86666667 0.88888889
0.84444444 0.86666667 0.77777778 0.82222222]
mean value: 0.8733333333333333
key: train_accuracy
value: [0.96790123 0.95802469 0.95061728 0.95802469 0.95308642 0.96296296
0.96296296 0.95802469 0.95555556 0.95802469]
mean value: 0.9585185185185185
key: test_fscore
value: [0.9047619 0.86363636 0.95454545 0.93023256 0.86956522 0.89361702
0.85106383 0.86363636 0.79166667 0.82608696]
mean value: 0.8748812336363161
key: train_fscore
value: [0.96836983 0.95843521 0.95073892 0.95843521 0.95354523 0.96240602
0.96277916 0.95802469 0.95566502 0.95823096]
mean value: 0.9586630239446279
key: test_precision
value: [0.95 0.86363636 0.95454545 0.95238095 0.83333333 0.875
0.83333333 0.9047619 0.76 0.82608696]
mean value: 0.8753078298513082
key: train_precision
value: [0.95673077 0.95145631 0.95073892 0.95145631 0.94660194 0.97461929
0.96517413 0.95566502 0.95098039 0.95121951]
mean value: 0.9554642596269585
key: test_recall
value: [0.86363636 0.86363636 0.95454545 0.90909091 0.90909091 0.91304348
0.86956522 0.82608696 0.82608696 0.82608696]
mean value: 0.8760869565217391
key: train_recall
value: [0.98029557 0.96551724 0.95073892 0.96551724 0.96059113 0.95049505
0.96039604 0.96039604 0.96039604 0.96534653]
mean value: 0.9619689801492465
key: test_roc_auc
value: [0.91007905 0.86660079 0.9555336 0.93280632 0.86758893 0.88833992
0.84387352 0.86758893 0.77667984 0.82213439]
mean value: 0.8731225296442688
key: train_roc_auc
value: [0.96787056 0.95800615 0.95061698 0.95800615 0.95306784 0.96293225
0.96295664 0.95803053 0.95556748 0.95804273]
mean value: 0.9585097302833732
key: test_jcc
value: [0.82608696 0.76 0.91304348 0.86956522 0.76923077 0.80769231
0.74074074 0.76 0.65517241 0.7037037 ]
mean value: 0.7805235587334538
key: train_jcc
value: [0.93867925 0.92018779 0.90610329 0.92018779 0.91121495 0.92753623
0.92822967 0.91943128 0.91509434 0.91981132]
mean value: 0.9206475908747523
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01486421 0.01032543 0.01009607 0.01081514 0.01028252 0.01089168
0.01108813 0.01036739 0.01119781 0.01117921]
mean value: 0.011110758781433106
key: score_time
value: [0.011132 0.00922537 0.00892806 0.00906157 0.00955486 0.00952983
0.00965738 0.00968456 0.00975657 0.00976205]
mean value: 0.009629225730895996
key: test_mcc
value: [0.79670588 0.77821935 0.73663511 0.86732843 0.77865613 0.87406293
0.46930785 0.64426877 0.73320158 0.51185771]
mean value: 0.7190243743973286
key: train_mcc
value: [0.6994877 0.68482256 0.68960241 0.7385111 0.74355351 0.76791201
0.70964919 0.75845593 0.74835945 0.78285689]
mean value: 0.7323210752164393
key: test_accuracy
value: [0.88888889 0.88888889 0.86666667 0.93333333 0.88888889 0.93333333
0.73333333 0.82222222 0.86666667 0.75555556]
mean value: 0.8577777777777778
key: train_accuracy
value: [0.84938272 0.84197531 0.84444444 0.8691358 0.87160494 0.88395062
0.85432099 0.87901235 0.87407407 0.89135802]
mean value: 0.865925925925926
key: test_fscore
value: [0.87179487 0.88372093 0.86956522 0.93023256 0.88888889 0.93877551
0.72727273 0.82608696 0.86956522 0.75555556]
mean value: 0.8561458433392566
key: train_fscore
value: [0.84634761 0.83838384 0.84130982 0.86783042 0.87 0.88395062
0.84987277 0.87657431 0.87218045 0.89 ]
mean value: 0.8636449842307918
key: test_precision
value: [1. 0.9047619 0.83333333 0.95238095 0.86956522 0.88461538
0.76190476 0.82608696 0.86956522 0.77272727]
mean value: 0.8674941001027957
key: train_precision
value: [0.86597938 0.86010363 0.86082474 0.87878788 0.88324873 0.8817734
0.87434555 0.89230769 0.88324873 0.8989899 ]
mean value: 0.8779609631421748
key: test_recall
value: [0.77272727 0.86363636 0.90909091 0.90909091 0.90909091 1.
0.69565217 0.82608696 0.86956522 0.73913043]
mean value: 0.8494071146245059
key: train_recall
value: [0.82758621 0.81773399 0.8226601 0.85714286 0.85714286 0.88613861
0.82673267 0.86138614 0.86138614 0.88118812]
mean value: 0.8499097693020533
key: test_roc_auc
value: [0.88636364 0.88833992 0.86758893 0.93280632 0.88932806 0.93181818
0.73418972 0.82213439 0.86660079 0.75592885]
mean value: 0.857509881422925
key: train_roc_auc
value: [0.84943667 0.84203531 0.84449837 0.86916549 0.87164074 0.88395601
0.85425304 0.87896893 0.87404282 0.89133298]
mean value: 0.8659330341901185
key: test_jcc
value: [0.77272727 0.79166667 0.76923077 0.86956522 0.8 0.88461538
0.57142857 0.7037037 0.76923077 0.60714286]
mean value: 0.7539311212137298
key: train_jcc
value: [0.73362445 0.72173913 0.72608696 0.76651982 0.7699115 0.7920354
0.73893805 0.78026906 0.77333333 0.8018018 ]
mean value: 0.7604259514076851
MCC on Blind test: 0.77
Accuracy on Blind test: 0.88
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01319623 0.02258158 0.01745749 0.02201819 0.02212381 0.02121663
0.02074075 0.01941252 0.02034569 0.02611756]
mean value: 0.020521044731140137
key: score_time
value: [0.00947046 0.01199651 0.01194668 0.01197267 0.01198769 0.01198697
0.01204848 0.0119555 0.0119822 0.01206636]
mean value: 0.011741352081298829
key: test_mcc
value: [0.79670588 0.82213439 0.87406293 0.95643752 0.59725988 0.73663511
0.73559956 0.55362003 0.77865613 0.77865613]
mean value: 0.7629767557040962
key: train_mcc
value: [0.77582446 0.8644041 0.81918005 0.88888095 0.61326848 0.85131769
0.88257176 0.69882885 0.85568499 0.92648542]
mean value: 0.8176446738630777
key: test_accuracy
value: [0.88888889 0.91111111 0.93333333 0.97777778 0.77777778 0.86666667
0.86666667 0.75555556 0.88888889 0.88888889]
mean value: 0.8755555555555555
key: train_accuracy
value: [0.87901235 0.9308642 0.90617284 0.94320988 0.77777778 0.92098765
0.94074074 0.82962963 0.92345679 0.96296296]
mean value: 0.9014814814814814
key: test_fscore
value: [0.87179487 0.90909091 0.92682927 0.97674419 0.80769231 0.86363636
0.875 0.8 0.88888889 0.88888889]
mean value: 0.8808565684331424
key: train_fscore
value: [0.86501377 0.93364929 0.9 0.94117647 0.81707317 0.9144385
0.94202899 0.85350318 0.91733333 0.96350365]
mean value: 0.904772036038694
key: test_precision
value: [1. 0.90909091 1. 1. 0.7 0.9047619
0.84 0.6875 0.90909091 0.90909091]
mean value: 0.8859534632034631
key: train_precision
value: [0.98125 0.89954338 0.96610169 0.9787234 0.69550173 0.99418605
0.91981132 0.7472119 0.99421965 0.94736842]
mean value: 0.9123917545678761
key: test_recall
value: [0.77272727 0.90909091 0.86363636 0.95454545 0.95454545 0.82608696
0.91304348 0.95652174 0.86956522 0.86956522]
mean value: 0.8889328063241106
key: train_recall
value: [0.77339901 0.97044335 0.84236453 0.90640394 0.99014778 0.84653465
0.96534653 0.9950495 0.85148515 0.98019802]
mean value: 0.9121372482075794
key: test_roc_auc
value: [0.88636364 0.91106719 0.93181818 0.97727273 0.78162055 0.86758893
0.86561265 0.75098814 0.88932806 0.88932806]
mean value: 0.875098814229249
key: train_roc_auc
value: [0.87927376 0.93076623 0.90633078 0.94330098 0.77725211 0.92080427
0.94080135 0.83003707 0.92327952 0.96300541]
mean value: 0.9014851485148515
key: test_jcc
value: [0.77272727 0.83333333 0.86363636 0.95454545 0.67741935 0.76
0.77777778 0.66666667 0.8 0.8 ]
mean value: 0.7906106223525579
key: train_jcc
value: [0.76213592 0.87555556 0.81818182 0.88888889 0.69072165 0.84236453
0.89041096 0.74444444 0.84729064 0.92957746]
mean value: 0.8289571874991976
MCC on Blind test: 0.81
Accuracy on Blind test: 0.9
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01864982 0.01952457 0.01760387 0.018224 0.01664209 0.01940393
0.01955223 0.03549552 0.02013016 0.0204246 ]
mean value: 0.020565080642700195
key: score_time
value: [0.01202798 0.01201439 0.012398 0.01194167 0.01194906 0.01242661
0.01245832 0.02830505 0.01237369 0.01226449]
mean value: 0.013815927505493163
key: test_mcc
value: [0.91452919 0.87406293 0.87476705 0.72645449 0.57868151 0.58158
0.73559956 0.57373395 0.69404997 0.55362003]
mean value: 0.7107078686833688
key: train_mcc
value: [0.80550226 0.82411192 0.82353111 0.62805778 0.7612786 0.64806439
0.9019476 0.77341987 0.90127552 0.66265175]
mean value: 0.7729840794536423
key: test_accuracy
value: [0.95555556 0.93333333 0.93333333 0.84444444 0.77777778 0.75555556
0.86666667 0.77777778 0.84444444 0.75555556]
mean value: 0.8444444444444444
key: train_accuracy
value: [0.8962963 0.90617284 0.90864198 0.78518519 0.87160494 0.79753086
0.95061728 0.87654321 0.95061728 0.80493827]
mean value: 0.8748148148148148
key: test_fscore
value: [0.95238095 0.92682927 0.93617021 0.8627451 0.8 0.80701754
0.875 0.80769231 0.85714286 0.8 ]
mean value: 0.8624978240173622
key: train_fscore
value: [0.88648649 0.89784946 0.91415313 0.82281059 0.88444444 0.83057851
0.95145631 0.88888889 0.95024876 0.83643892]
mean value: 0.8863355507758013
key: test_precision
value: [1. 1. 0.88 0.75862069 0.71428571 0.67647059
0.84 0.72413793 0.80769231 0.6875 ]
mean value: 0.8088707230902972
key: train_precision
value: [0.98203593 0.98816568 0.86403509 0.70138889 0.80566802 0.71276596
0.93333333 0.80645161 0.955 0.71886121]
mean value: 0.8467705715067385
key: test_recall
value: [0.90909091 0.86363636 1. 1. 0.90909091 1.
0.91304348 0.91304348 0.91304348 0.95652174]
mean value: 0.9377470355731226
key: train_recall
value: [0.80788177 0.8226601 0.97044335 0.99507389 0.98029557 0.9950495
0.97029703 0.99009901 0.94554455 1. ]
mean value: 0.9477344778812856
key: test_roc_auc
value: [0.95454545 0.93181818 0.93478261 0.84782609 0.78063241 0.75
0.86561265 0.77470356 0.84288538 0.75098814]
mean value: 0.8433794466403162
key: train_roc_auc
value: [0.89651514 0.90637955 0.908489 0.78466566 0.8713359 0.79801736
0.95066576 0.8768229 0.95060479 0.80541872]
mean value: 0.8748914792957128
key: test_jcc
value: [0.90909091 0.86363636 0.88 0.75862069 0.66666667 0.67647059
0.77777778 0.67741935 0.75 0.66666667]
mean value: 0.762634901656756
key: train_jcc
value: [0.7961165 0.81463415 0.84188034 0.69896194 0.79282869 0.71024735
0.90740741 0.8 0.90521327 0.71886121]
mean value: 0.7986150853388724
MCC on Blind test: 0.75
Accuracy on Blind test: 0.87
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18614674 0.18338513 0.18672371 0.18425417 0.18633842 0.17906499
0.16986871 0.1727941 0.16615391 0.17229795]
mean value: 0.17870278358459474
key: score_time
value: [0.01630163 0.0177443 0.01695061 0.01684165 0.01702309 0.01606345
0.01537633 0.01550055 0.015517 0.01609135]
mean value: 0.016340994834899904
key: test_mcc
value: [0.91452919 0.91452919 1. 0.91106719 0.91485328 0.91452919
0.91106719 0.86732843 0.95643752 0.77865613]
mean value: 0.9082997309622647
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95555556 0.95555556 1. 0.95555556 0.95555556 0.95555556
0.95555556 0.93333333 0.97777778 0.88888889]
mean value: 0.9533333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95238095 0.95238095 1. 0.95454545 0.95652174 0.95833333
0.95652174 0.93617021 0.9787234 0.88888889]
mean value: 0.9534466676811728
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 0.95454545 0.91666667 0.92
0.95652174 0.91666667 0.95833333 0.90909091]
mean value: 0.9531824769433466
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.90909091 1. 0.95454545 1. 1.
0.95652174 0.95652174 1. 0.86956522]
mean value: 0.9555335968379447
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.95454545 1. 0.9555336 0.95652174 0.95454545
0.9555336 0.93280632 0.97727273 0.88932806]
mean value: 0.9530632411067194
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90909091 0.90909091 1. 0.91304348 0.91666667 0.92
0.91666667 0.88 0.95833333 0.8 ]
mean value: 0.9122891963109354
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.93
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0584321 0.06014061 0.07051253 0.0679574 0.06842804 0.0796442
0.0581665 0.08605957 0.07957792 0.07208705]
mean value: 0.07010059356689453
key: score_time
value: [0.03084612 0.02727365 0.02405024 0.02450156 0.03062487 0.02380657
0.0292778 0.04444814 0.02564001 0.03785896]
mean value: 0.029832792282104493
key: test_mcc
value: [0.91452919 0.95643752 0.95643752 1. 0.86758893 0.95643752
0.91452919 0.86732843 0.95643752 0.77865613]
mean value: 0.9168381944162244
key: train_mcc
value: [0.98519729 0.98029509 0.99507377 0.9901234 0.98029509 0.97532008
0.98024679 0.99507377 0.9704168 0.98024679]
mean value: 0.9832288871744899
key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.97777778
0.95555556 0.93333333 0.97777778 0.88888889]
mean value: 0.9577777777777777
key: train_accuracy
value: [0.99259259 0.99012346 0.99753086 0.99506173 0.99012346 0.98765432
0.99012346 0.99753086 0.98518519 0.99012346]
mean value: 0.9916049382716049
key: test_fscore
value: [0.95238095 0.97674419 0.97674419 1. 0.93333333 0.9787234
0.95833333 0.93617021 0.9787234 0.88888889]
mean value: 0.9580041901306127
key: train_fscore
value: [0.99259259 0.99009901 0.997543 0.99507389 0.99009901 0.98759305
0.99009901 0.99751861 0.98507463 0.99009901]
mean value: 0.9915791810761856
key: test_precision
value: [1. 1. 1. 1. 0.91304348 0.95833333
0.92 0.91666667 0.95833333 0.90909091]
mean value: 0.9575467720685112
key: train_precision
value: [0.9950495 0.99502488 0.99509804 0.99507389 0.99502488 0.99004975
0.99009901 1. 0.99 0.99009901]
mean value: 0.993551895808134
key: test_recall
value: [0.90909091 0.95454545 0.95454545 1. 0.95454545 1.
1. 0.95652174 1. 0.86956522]
mean value: 0.9598814229249012
key: train_recall
value: [0.99014778 0.98522167 1. 0.99507389 0.98522167 0.98514851
0.99009901 0.9950495 0.98019802 0.99009901]
mean value: 0.9896259084036483
key: test_roc_auc
value: [0.95454545 0.97727273 0.97727273 1. 0.93379447 0.97727273
0.95454545 0.93280632 0.97727273 0.88932806]
mean value: 0.9574110671936759
key: train_roc_auc
value: [0.99259864 0.99013559 0.99752475 0.9950617 0.99013559 0.98764815
0.9901234 0.99752475 0.9851729 0.9901234 ]
mean value: 0.9916048870896942
key: test_jcc
value: [0.90909091 0.95454545 0.95454545 1. 0.875 0.95833333
0.92 0.88 0.95833333 0.8 ]
mean value: 0.9209848484848485
key: train_jcc
value: [0.98529412 0.98039216 0.99509804 0.99019608 0.98039216 0.9754902
0.98039216 0.9950495 0.97058824 0.98039216]
mean value: 0.9833284799068142
MCC on Blind test: 0.93
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.11871982 0.17861295 0.23517871 0.15984797 0.21449161 0.23632669
0.18187809 0.18064666 0.17393255 0.1803596 ]
mean value: 0.1859994649887085
key: score_time
value: [0.02222514 0.02561641 0.023772 0.02483273 0.02410769 0.02581143
0.02375984 0.02371955 0.02334571 0.03471303]
mean value: 0.025190353393554688
key: test_mcc
value: [0.65335861 0.73320158 0.77821935 0.86758893 0.46720513 0.38019877
0.65604724 0.46640316 0.73559956 0.42744299]
mean value: 0.616526531713881
key: train_mcc
value: [0.98529376 0.98529376 0.99017193 0.99017193 0.99507389 0.98529269
0.99017145 1. 0.98529269 0.98529269]
mean value: 0.9892054809931761
key: test_accuracy
value: [0.82222222 0.86666667 0.88888889 0.93333333 0.73333333 0.68888889
0.82222222 0.73333333 0.86666667 0.71111111]
mean value: 0.8066666666666666
key: train_accuracy
value: [0.99259259 0.99259259 0.99506173 0.99506173 0.99753086 0.99259259
0.99506173 1. 0.99259259 0.99259259]
mean value: 0.9945679012345678
key: test_fscore
value: [0.8 0.86363636 0.88372093 0.93333333 0.71428571 0.68181818
0.80952381 0.73913043 0.875 0.69767442]
mean value: 0.7998123186217221
key: train_fscore
value: [0.99255583 0.99255583 0.9950495 0.9950495 0.99753086 0.9925187
0.99502488 1. 0.9925187 0.9925187 ]
mean value: 0.9945322521977115
key: test_precision
value: [0.88888889 0.86363636 0.9047619 0.91304348 0.75 0.71428571
0.89473684 0.73913043 0.84 0.75 ]
mean value: 0.8258483626721613
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.86363636 0.86363636 0.95454545 0.68181818 0.65217391
0.73913043 0.73913043 0.91304348 0.65217391]
mean value: 0.7786561264822134
key: train_recall
value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851
0.99009901 1. 0.98514851 0.98514851]
mean value: 0.9891357362337219
key: test_roc_auc
value: [0.8201581 0.86660079 0.88833992 0.93379447 0.73221344 0.68972332
0.82411067 0.73320158 0.86561265 0.71245059]
mean value: 0.8066205533596839
key: train_roc_auc
value: [0.99261084 0.99261084 0.99507389 0.99507389 0.99753695 0.99257426
0.9950495 1. 0.99257426 0.99257426]
mean value: 0.9945678681168609
key: test_jcc
value: [0.66666667 0.76 0.79166667 0.875 0.55555556 0.51724138
0.68 0.5862069 0.77777778 0.53571429]
mean value: 0.6745829228243021
key: train_jcc
value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851
0.99009901 1. 0.98514851 0.98514851]
mean value: 0.9891357362337219
MCC on Blind test: 0.62
Accuracy on Blind test: 0.81
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.67379546 0.64568853 0.6526382 0.67437029 0.65233517 0.66356587
0.65809655 0.66884041 0.65953517 0.64904428]
mean value: 0.6597909927368164
key: score_time
value: [0.00955462 0.00969028 0.0093205 0.0094049 0.00958753 0.00942659
0.00944066 0.01032162 0.00953197 0.00981259]
mean value: 0.009609127044677734
key: test_mcc
value: [0.91452919 0.95643752 0.95643752 1. 0.91485328 0.91452919
0.86732843 0.82506438 0.78530224 0.82213439]
mean value: 0.8956616127817072
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1. 0.95555556 0.95555556
0.93333333 0.91111111 0.88888889 0.91111111]
mean value: 0.9466666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95238095 0.97674419 0.97674419 1. 0.95652174 0.95833333
0.93617021 0.91666667 0.88372093 0.91304348]
mean value: 0.9470325684863796
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 0.91666667 0.92
0.91666667 0.88 0.95 0.91304348]
mean value: 0.9496376811594203
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.95454545 0.95454545 1. 1. 1.
0.95652174 0.95652174 0.82608696 0.91304348]
mean value: 0.9470355731225296
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.97727273 0.97727273 1. 0.95652174 0.95454545
0.93280632 0.91007905 0.89031621 0.91106719]
mean value: 0.9464426877470355
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90909091 0.95454545 0.95454545 1. 0.91666667 0.92
0.88 0.84615385 0.79166667 0.84 ]
mean value: 0.9012668997668998
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.96
Accuracy on Blind test: 0.98
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03017378 0.05092001 0.03264356 0.0517385 0.03246832 0.07408309
0.03232288 0.03246665 0.05935621 0.03307438]
mean value: 0.04292473793029785
key: score_time
value: [0.02918839 0.02444148 0.01487088 0.01399708 0.02390194 0.01415896
0.0149672 0.01498842 0.02188349 0.01332688]
mean value: 0.018572473526000978
key: test_mcc
value: [0.5216284 0.46720513 0.51089209 0.43557241 0.38112585 0.55841694
0.19960474 0.44784269 0.2903816 0.46640316]
mean value: 0.4279073012043456
key: train_mcc
value: [0.77727216 0.81448302 0.98519729 0.95177249 0.878915 0.8700435
0.94707011 0.94707011 0.96124772 0.98519693]
mean value: 0.9118268331436081
key: test_accuracy
value: [0.75555556 0.73333333 0.75555556 0.71111111 0.68888889 0.77777778
0.6 0.71111111 0.64444444 0.73333333]
mean value: 0.7111111111111111
key: train_accuracy
value: [0.87654321 0.89876543 0.99259259 0.97530864 0.93580247 0.9308642
0.97283951 0.97283951 0.98024691 0.99259259]
mean value: 0.9528395061728395
key: test_fscore
value: [0.71794872 0.71428571 0.74418605 0.73469388 0.65 0.77272727
0.60869565 0.66666667 0.68 0.73913043]
mean value: 0.7028334382647542
key: train_fscore
value: [0.85955056 0.88767123 0.99259259 0.97596154 0.93157895 0.92553191
0.97201018 0.97201018 0.98058252 0.99255583]
mean value: 0.9490045499762084
key: test_precision
value: [0.82352941 0.75 0.76190476 0.66666667 0.72222222 0.80952381
0.60869565 0.8125 0.62962963 0.73913043]
mean value: 0.7323802588668318
key: train_precision
value: [1. 1. 0.9950495 0.95305164 1. 1.
1. 1. 0.96190476 0.99502488]
mean value: 0.9905030785669636
key: test_recall
value: [0.63636364 0.68181818 0.72727273 0.81818182 0.59090909 0.73913043
0.60869565 0.56521739 0.73913043 0.73913043]
mean value: 0.6845849802371542
key: train_recall
value: [0.75369458 0.79802956 0.99014778 1. 0.87192118 0.86138614
0.94554455 0.94554455 1. 0.99009901]
mean value: 0.9156367360874018
key: test_roc_auc
value: [0.75296443 0.73221344 0.75494071 0.71343874 0.68675889 0.77865613
0.59980237 0.71442688 0.64229249 0.73320158]
mean value: 0.7108695652173913
key: train_roc_auc
value: [0.87684729 0.89901478 0.99259864 0.97524752 0.93596059 0.93069307
0.97277228 0.97277228 0.98029557 0.99258645]
mean value: 0.9528788469980003
key: test_jcc
value: [0.56 0.55555556 0.59259259 0.58064516 0.48148148 0.62962963
0.4375 0.5 0.51515152 0.5862069 ]
mean value: 0.5438762832252821
key: train_jcc
value: [0.75369458 0.79802956 0.98529412 0.95305164 0.87192118 0.86138614
0.94554455 0.94554455 0.96190476 0.98522167]
mean value: 0.9061592765342953
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0267837 0.04602504 0.04733205 0.03782296 0.05901909 0.02543259
0.04678321 0.01800036 0.06643128 0.03821254]
mean value: 0.041184282302856444
key: score_time
value: [0.02365017 0.02335858 0.03110051 0.02066255 0.0384798 0.02338219
0.01257777 0.02225137 0.02346373 0.02367878]
mean value: 0.024260544776916505
key: test_mcc
value: [0.91452919 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438
0.73559956 0.64752602 0.70501339 0.64426877]
mean value: 0.802922934636812
key: train_mcc
value: [0.85762118 0.86692207 0.84700001 0.85704185 0.86692207 0.87160416
0.87199635 0.86211613 0.88643125 0.88165855]
mean value: 0.8669313605558044
key: test_accuracy
value: [0.95555556 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111
0.86666667 0.82222222 0.84444444 0.82222222]
mean value: 0.9
key: train_accuracy
value: [0.92839506 0.93333333 0.92345679 0.92839506 0.93333333 0.93580247
0.93580247 0.9308642 0.94320988 0.94074074]
mean value: 0.9333333333333333
key: test_fscore
value: [0.95238095 0.97674419 0.95454545 0.95454545 0.88888889 0.91666667
0.875 0.81818182 0.8627451 0.82608696]
mean value: 0.9025785475816701
key: train_fscore
value: [0.93012048 0.93430657 0.92420538 0.92944039 0.93430657 0.93564356
0.93658537 0.93170732 0.94320988 0.94117647]
mean value: 0.9340701983296061
key: test_precision
value: [1. 1. 0.95454545 0.95454545 0.86956522 0.88
0.84 0.85714286 0.78571429 0.82608696]
mean value: 0.8967600225861095
key: train_precision
value: [0.91037736 0.92307692 0.91747573 0.91826923 0.92307692 0.93564356
0.92307692 0.91826923 0.9408867 0.93203883]
mean value: 0.9242191416230418
key: test_recall
value: [0.90909091 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174
0.91304348 0.7826087 0.95652174 0.82608696]
mean value: 0.9116600790513834
key: train_recall
value: [0.95073892 0.94581281 0.93103448 0.9408867 0.94581281 0.93564356
0.95049505 0.94554455 0.94554455 0.95049505]
mean value: 0.9442008486562942
key: test_roc_auc
value: [0.95454545 0.97727273 0.9555336 0.9555336 0.88932806 0.91007905
0.86561265 0.82312253 0.84189723 0.82213439]
mean value: 0.8995059288537549
key: train_roc_auc
value: [0.92833976 0.93330244 0.92343803 0.92836414 0.93330244 0.93580208
0.93583866 0.93090036 0.94321563 0.94076477]
mean value: 0.9333268302199678
key: test_jcc
value: [0.90909091 0.95454545 0.91304348 0.91304348 0.8 0.84615385
0.77777778 0.69230769 0.75862069 0.7037037 ]
mean value: 0.8268287029756295
key: train_jcc
value: [0.86936937 0.87671233 0.85909091 0.86818182 0.87671233 0.87906977
0.88073394 0.87214612 0.89252336 0.88888889]
mean value: 0.8763428838668663
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.47064686 0.37768698 0.39077091 0.46480179 0.4833498 0.96630669
0.25368643 0.32124949 0.53351617 0.37344742]
mean value: 0.46354625225067136
key: score_time
value: [0.03063655 0.02306867 0.02072167 0.02873063 0.02459288 0.01240277
0.01261353 0.03182149 0.0360291 0.02511287]
mean value: 0.02457301616668701
key: test_mcc
value: [0.83484711 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438
0.68911026 0.64752602 0.70501339 0.64426877]
mean value: 0.7903057965349736
key: train_mcc
value: [0.79798935 0.86692207 0.84700001 0.85704185 0.90127552 0.87160416
0.92098717 0.86211613 0.88643125 0.88165855]
mean value: 0.869302605153845
key: test_accuracy
value: [0.91111111 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111
0.84444444 0.82222222 0.84444444 0.82222222]
mean value: 0.8933333333333333
key: train_accuracy
value: [0.89876543 0.93333333 0.92345679 0.92839506 0.95061728 0.93580247
0.96049383 0.9308642 0.94320988 0.94074074]
mean value: 0.9345679012345679
key: test_fscore
value: [0.9 0.97674419 0.95454545 0.95454545 0.88888889 0.91666667
0.85106383 0.81818182 0.8627451 0.82608696]
mean value: 0.8949468353222984
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.90072639 0.93430657 0.92420538 0.92944039 0.95098039 0.93564356
0.96039604 0.93170732 0.94320988 0.94117647]
mean value: 0.9351792390184265
key: test_precision
value: [1. 1. 0.95454545 0.95454545 0.86956522 0.88
0.83333333 0.85714286 0.78571429 0.82608696]
mean value: 0.8960933559194428
key: train_precision
value: [0.88571429 0.92307692 0.91747573 0.91826923 0.94634146 0.93564356
0.96039604 0.91826923 0.9408867 0.93203883]
mean value: 0.9278112000318885
key: test_recall
value: [0.81818182 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174
0.86956522 0.7826087 0.95652174 0.82608696]
mean value: 0.8982213438735178
key: train_recall
value: [0.91625616 0.94581281 0.93103448 0.9408867 0.95566502 0.93564356
0.96039604 0.94554455 0.94554455 0.95049505]
mean value: 0.9427278934790031
key: test_roc_auc
value: [0.90909091 0.97727273 0.9555336 0.9555336 0.88932806 0.91007905
0.84387352 0.82312253 0.84189723 0.82213439]
mean value: 0.8927865612648221
key: train_roc_auc
value: [0.89872214 0.93330244 0.92343803 0.92836414 0.95060479 0.93580208
0.96049359 0.93090036 0.94321563 0.94076477]
mean value: 0.9345607959810759
key: test_jcc
value: [0.81818182 0.95454545 0.91304348 0.91304348 0.8 0.84615385
0.74074074 0.69230769 0.75862069 0.7037037 ]
mean value: 0.8140340901810167
key: train_jcc
value: [0.81938326 0.87671233 0.85909091 0.86818182 0.90654206 0.87906977
0.92380952 0.87214612 0.89252336 0.88888889]
mean value: 0.8786348035374227
MCC on Blind test: 0.79
Accuracy on Blind test: 0.89