LSHTM_analysis/scripts/ml/log_katg_config.txt
2022-06-20 21:55:47 +01:00

19492 lines
965 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 817
PASS: my_features_df and aa_df successfully combined
nrows: 817
ncols: 269
count of NULL values before imputation
or_mychisq 244
log10_or_mychisq 244
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
No. of numerical features: 45
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
Original Data
Counter({1: 309, 0: 158}) Data dim: (467, 52)
-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (467, 52)
Test data size: (350, 52)
y_train numbers: Counter({1: 309, 0: 158})
y_train ratio: 0.511326860841424
y_test_numbers: Counter({0: 315, 1: 35})
y_test ratio: 9.0
-------------------------------------------------------------
Simple Random OverSampling
Counter({1: 309, 0: 309})
(618, 52)
Simple Random UnderSampling
Counter({0: 158, 1: 158})
(316, 52)
Simple Combined Over and UnderSampling
Counter({0: 309, 1: 309})
(618, 52)
SMOTE_NC OverSampling
Counter({1: 309, 0: 309})
(618, 52)
#####################################################################
Running ML analysis: UQ [without AA index but with active site annotations]
Gene name: katG
Drug name: isoniazid
Output directory: /home/tanu/git/Data/isoniazid/output/ml/uq_v1/
Sanity checks:
Total input features: 52
Training data size: (467, 52)
Test data size: (350, 52)
Target feature numbers (training data): Counter({1: 309, 0: 158})
Target features ratio (training data: 0.511326860841424
Target feature numbers (test data): Counter({0: 315, 1: 35})
Target features ratio (test data): 9.0
#####################################################################
================================================================
Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02167606 0.02372026 0.03166604 0.02357769 0.02548194 0.02195692
0.02136278 0.02161574 0.02221417 0.02264333]
mean value: 0.023591494560241698
key: score_time
value: [0.0109992 0.01075363 0.01093793 0.01066351 0.01062679 0.01058674
0.01058102 0.01062608 0.0105927 0.01066446]
mean value: 0.010703206062316895
key: test_mcc
value: [0.90662544 0.66402366 0.60908698 0.90662544 0.86070252 0.66337469
0.67402153 0.80215054 0.66040066 0.85943956]
mean value: 0.7606451028769974
key: train_mcc
value: [0.83338837 0.82273265 0.789683 0.77877628 0.76217448 0.80630977
0.79579908 0.77434754 0.7963019 0.80086095]
mean value: 0.7960374023577294
key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.95744681 0.85106383 0.82978723 0.95744681 0.93617021 0.85106383
0.85106383 0.91304348 0.84782609 0.93478261]
mean value: 0.8929694727104533
key: train_accuracy
value: [0.92619048 0.92142857 0.90714286 0.90238095 0.8952381 0.91428571
0.90952381 0.90023753 0.90973872 0.91211401]
mean value: 0.9098280737473137
key: test_fscore
value: [0.96875 0.88888889 0.87878788 0.96875 0.95384615 0.89552239
0.8852459 0.93548387 0.8852459 0.95238095]
mean value: 0.9212901936210006
key: train_fscore
value: [0.94532628 0.94240838 0.93169877 0.92869565 0.92334495 0.93728223
0.93425606 0.92682927 0.93379791 0.93542757]
mean value: 0.9339067066812484
key: test_precision
value: [0.93939394 0.875 0.82857143 0.93939394 0.91176471 0.83333333
0.9 0.93548387 0.9 0.90909091]
mean value: 0.8972032126633644
key: train_precision
value: [0.92733564 0.91525424 0.90784983 0.8989899 0.89527027 0.90878378
0.9 0.89864865 0.90540541 0.91156463]
mean value: 0.9069102339726427
key: test_recall
value: [1. 0.90322581 0.93548387 1. 1. 0.96774194
0.87096774 0.93548387 0.87096774 1. ]
mean value: 0.9483870967741935
key: train_recall
value: [0.96402878 0.97122302 0.95683453 0.96043165 0.95323741 0.9676259
0.97122302 0.95683453 0.96402878 0.96057348]
mean value: 0.962604110260179
key: test_roc_auc
value: [0.9375 0.8266129 0.78024194 0.9375 0.90625 0.79637097
0.84173387 0.90107527 0.83548387 0.90625 ]
mean value: 0.8669018817204301
key: train_roc_auc
value: [0.90807073 0.89758334 0.88334684 0.87458202 0.86746378 0.88874253
0.87997771 0.87352216 0.88411229 0.88873744]
mean value: 0.8846138841461438
key: test_jcc
value: [0.93939394 0.8 0.78378378 0.93939394 0.91176471 0.81081081
0.79411765 0.87878788 0.79411765 0.90909091]
mean value: 0.8561261261261262
key: train_jcc
value: [0.89632107 0.89108911 0.87213115 0.86688312 0.85760518 0.88196721
0.87662338 0.86363636 0.87581699 0.87868852]
mean value: 0.8760762092991343
MCC on Blind test: 0.23
Accuracy on Blind test: 0.45
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.74151611 1.08886981 0.69769788 0.70535755 0.87346554 0.72519445
0.73284912 0.83741045 0.65530038 0.68675303]
mean value: 0.7744414329528808
key: score_time
value: [0.01378059 0.01389503 0.01416969 0.01405454 0.01443934 0.0140748
0.01437092 0.01120043 0.0144248 0.01425123]
mean value: 0.013866138458251954
key: test_mcc
value: [1. 0.8566725 1. 0.95299692 0.90662544 0.76032282
0.90524194 0.9085301 0.85513419 0.85513419]
mean value: 0.90006580934109
key: train_mcc
value: [0.93593571 0.96269263 0.94130059 0.93593571 0.95736701 0.95734993
0.94131391 0.9469026 0.95756757 0.95740101]
mean value: 0.9493766673456756
key: test_accuracy
value: [1. 0.93617021 1. 0.9787234 0.95744681 0.89361702
0.95744681 0.95652174 0.93478261 0.93478261]
mean value: 0.9549491211840888
key: train_accuracy
value: [0.97142857 0.98333333 0.97380952 0.97142857 0.98095238 0.98095238
0.97380952 0.97624703 0.98099762 0.98099762]
mean value: 0.9773956565999321
key: test_fscore
value: [1. 0.95238095 1. 0.98412698 0.96875 0.92307692
0.96774194 0.96666667 0.95081967 0.95081967]
mean value: 0.9664382805997692
key: train_fscore
value: [0.97857143 0.98747764 0.980322 0.97857143 0.98571429 0.98566308
0.98039216 0.98214286 0.98571429 0.98571429]
mean value: 0.983028345294684
key: test_precision
value: [1. 0.9375 1. 0.96875 0.93939394 0.88235294
0.96774194 1. 0.96666667 0.93548387]
mean value: 0.959788935368869
key: train_precision
value: [0.97163121 0.98220641 0.97508897 0.97163121 0.9787234 0.98214286
0.97173145 0.9751773 0.9787234 0.98220641]
mean value: 0.9769262610088233
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
0.96774194 0.93548387 0.93548387 0.96666667]
mean value: 0.9740860215053764
key: train_recall
value: [0.98561151 0.99280576 0.98561151 0.98561151 0.99280576 0.98920863
0.98920863 0.98920863 0.99280576 0.98924731]
mean value: 0.9892125009669683
key: test_roc_auc
value: [1. 0.92137097 1. 0.96875 0.9375 0.85887097
0.95262097 0.96774194 0.9344086 0.92083333]
mean value: 0.9462096774193549
key: train_roc_auc
value: [0.96463674 0.97879724 0.96815787 0.96463674 0.97527612 0.97699868
0.9664353 0.97012879 0.97542386 0.97701802]
mean value: 0.9717509367831001
key: test_jcc
value: [1. 0.90909091 1. 0.96875 0.93939394 0.85714286
0.9375 0.93548387 0.90625 0.90625 ]
mean value: 0.9359861576595447
key: train_jcc
value: [0.95804196 0.97526502 0.96140351 0.95804196 0.97183099 0.97173145
0.96153846 0.96491228 0.97183099 0.97183099]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
0.9666427591273636
MCC on Blind test: 0.14
Accuracy on Blind test: 0.32
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01048064 0.00996375 0.00781631 0.00742173 0.00739574 0.00739932
0.00733399 0.00878787 0.0088315 0.00836849]
mean value: 0.008379936218261719
key: score_time
value: [0.01066589 0.00898504 0.0085032 0.0080471 0.00794482 0.0084784
0.00797868 0.00964499 0.00877905 0.00852466]
mean value: 0.008755183219909668
key: test_mcc
value: [0.8566725 0.50614703 0.62096774 0.76032282 0.81048387 0.71572581
0.59764284 0.75776742 0.60430108 0.36514837]
mean value: 0.6595179479313003
key: train_mcc
value: [0.70671585 0.70811111 0.71695894 0.68716403 0.71727396 0.73126698
0.71138479 0.71852622 0.74194944 0.54109586]
mean value: 0.6980447184919443
key: test_accuracy
value: [0.93617021 0.76595745 0.82978723 0.89361702 0.91489362 0.87234043
0.80851064 0.89130435 0.82608696 0.67391304]
mean value: 0.8412580943570768
key: train_accuracy
value: [0.87142857 0.86666667 0.86904762 0.85714286 0.87142857 0.87857143
0.86904762 0.87173397 0.88361045 0.74821853]
mean value: 0.8586896278701505
key: test_fscore
value: [0.95238095 0.81355932 0.87096774 0.92307692 0.93548387 0.90322581
0.84745763 0.91803279 0.87096774 0.71698113]
mean value: 0.8752133904861458
key: train_fscore
value: [0.90721649 0.8974359 0.89833641 0.89010989 0.90145985 0.90744102
0.89981785 0.90145985 0.91139241 0.77916667]
mean value: 0.8893836343169823
key: test_precision
value: [0.9375 0.85714286 0.87096774 0.88235294 0.93548387 0.90322581
0.89285714 0.93333333 0.87096774 0.82608696]
mean value: 0.8909918392321865
key: train_precision
value: [0.86842105 0.9141791 0.92395437 0.90671642 0.91481481 0.91575092
0.91143911 0.91481481 0.91636364 0.93034826]
mean value: 0.9116802502485006
key: test_recall
value: [0.96774194 0.77419355 0.87096774 0.96774194 0.93548387 0.90322581
0.80645161 0.90322581 0.87096774 0.63333333]
mean value: 0.8633333333333333
key: train_recall
value: [0.94964029 0.88129496 0.87410072 0.87410072 0.88848921 0.89928058
0.88848921 0.88848921 0.90647482 0.6702509 ]
mean value: 0.8720610608287563
key: test_roc_auc
value: [0.92137097 0.76209677 0.81048387 0.85887097 0.90524194 0.8578629
0.80947581 0.88494624 0.80215054 0.69166667]
mean value: 0.8304166666666667
key: train_roc_auc
value: [0.83397507 0.85966157 0.86662782 0.84902219 0.86325869 0.86865437
0.85973756 0.86382502 0.87281783 0.78582967]
mean value: 0.8523409805276452
key: test_jcc
value: [0.90909091 0.68571429 0.77142857 0.85714286 0.87878788 0.82352941
0.73529412 0.84848485 0.77142857 0.55882353]
mean value: 0.7839724980901451
key: train_jcc
value: [0.83018868 0.81395349 0.81543624 0.8019802 0.82059801 0.83056478
0.81788079 0.82059801 0.8372093 0.63822526]
mean value: 0.8026634757590374
MCC on Blind test: 0.22
Accuracy on Blind test: 0.56
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00818086 0.0079577 0.00792146 0.00760269 0.00763583 0.00755167
0.00756431 0.00766754 0.00792885 0.00768661]
mean value: 0.00776975154876709
key: score_time
value: [0.00826931 0.00858402 0.0080297 0.00802231 0.00803876 0.00794721
0.00808096 0.00816536 0.00821137 0.0079844 ]
mean value: 0.008133339881896972
key: test_mcc
value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232
0.66402366 0.59332241 0.38733878 0.70954337]
mean value: 0.6137610732708011
key: train_mcc
value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246
0.6506538 0.65794031 0.65846852 0.63442864]
mean value: 0.6459914516114823
key: test_accuracy
value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723
0.85106383 0.82608696 0.73913043 0.86956522]
mean value: 0.8307123034227567
key: train_accuracy
value: [0.83809524 0.8452381 0.85238095 0.84285714 0.84285714 0.84285714
0.84761905 0.85035629 0.85035629 0.84085511]
mean value: 0.8453472457866757
key: test_fscore
value: [0.91803279 0.875 0.78125 0.90909091 0.92307692 0.88235294
0.88888889 0.875 0.8125 0.90625 ]
mean value: 0.8771442449118437
key: train_fscore
value: [0.88316151 0.88773748 0.89007092 0.8862069 0.88581315 0.88581315
0.88965517 0.89156627 0.89081456 0.88468158]
mean value: 0.8875520685563664
key: test_precision
value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081
0.875 0.84848485 0.78787879 0.85294118]
mean value: 0.8454005361358302
key: train_precision
value: [0.84539474 0.8538206 0.87762238 0.85099338 0.85333333 0.85333333
0.85430464 0.85478548 0.85953177 0.85099338]
mean value: 0.8554113020989377
key: test_recall
value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194
0.90322581 0.90322581 0.83870968 0.96666667]
mean value: 0.9127956989247312
key: train_recall
value: [0.92446043 0.92446043 0.9028777 0.92446043 0.92086331 0.92086331
0.92805755 0.93165468 0.92446043 0.92114695]
mean value: 0.9223305226786314
key: test_roc_auc
value: [0.8891129 0.7953629 0.65322581 0.82762097 0.85887097 0.76512097
0.8266129 0.78494624 0.68602151 0.82708333]
mean value: 0.7913978494623656
key: train_roc_auc
value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208
0.8090992 0.81198118 0.81537707 0.80212277]
mean value: 0.8085601200017799
key: test_jcc
value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368
0.8 0.77777778 0.68421053 0.82857143]
mean value: 0.783779787463998
key: train_jcc
value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106
0.80124224 0.80434783 0.803125 0.79320988]
mean value: 0.7978475494770487
MCC on Blind test: 0.24
Accuracy on Blind test: 0.47
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00735164 0.00838447 0.00824022 0.00810122 0.00807238 0.0080514
0.00810838 0.00790358 0.00764203 0.00775075]
mean value: 0.00796060562133789
key: score_time
value: [0.09431863 0.01160264 0.01148391 0.01503587 0.01451468 0.0130167
0.01415229 0.01103735 0.01092792 0.01100278]
mean value: 0.02070927619934082
key: test_mcc
value: [0.76746995 0.76034808 0.4031367 0.65994312 0.71025956 0.61207663
0.56769924 0.58251534 0.49033059 0.48102958]
mean value: 0.6034808785180602
key: train_mcc
value: [0.69858559 0.69632669 0.75172804 0.69676775 0.73520628 0.71297421
0.70164234 0.70915156 0.73690278 0.72050578]
mean value: 0.7159791011797761
key: test_accuracy
value: [0.89361702 0.89361702 0.74468085 0.85106383 0.87234043 0.82978723
0.80851064 0.80434783 0.7826087 0.76086957]
mean value: 0.8241443108233117
key: train_accuracy
value: [0.86666667 0.86666667 0.89047619 0.86666667 0.88333333 0.87380952
0.86904762 0.87173397 0.88361045 0.87648456]
mean value: 0.8748495645288994
key: test_fscore
value: [0.91803279 0.92063492 0.81818182 0.89230769 0.90625 0.875
0.85714286 0.84745763 0.84375 0.81355932]
mean value: 0.8692317024305076
key: train_fscore
value: [0.90070922 0.9020979 0.91901408 0.90175439 0.91388401 0.90718039
0.90401396 0.90526316 0.91358025 0.90812721]
mean value: 0.9075624559641324
key: test_precision
value: [0.93333333 0.90625 0.77142857 0.85294118 0.87878788 0.84848485
0.84375 0.89285714 0.81818182 0.82758621]
mean value: 0.8573600976440733
key: train_precision
value: [0.88811189 0.87755102 0.9 0.88013699 0.89347079 0.88395904
0.8779661 0.88356164 0.89619377 0.89547038]
mean value: 0.887642163000012
key: test_recall
value: [0.90322581 0.93548387 0.87096774 0.93548387 0.93548387 0.90322581
0.87096774 0.80645161 0.87096774 0.8 ]
mean value: 0.8832258064516129
key: train_recall
value: [0.91366906 0.92805755 0.93884892 0.92446043 0.9352518 0.93165468
0.93165468 0.92805755 0.93165468 0.92114695]
mean value: 0.9284456305923003
key: test_roc_auc
value: [0.8891129 0.87399194 0.68548387 0.81149194 0.84274194 0.7953629
0.77923387 0.80322581 0.73548387 0.74375 ]
mean value: 0.7959879032258065
key: train_roc_auc
value: [0.84415848 0.83726821 0.86731178 0.83899078 0.85847097 0.84610903
0.83906677 0.84514766 0.86093223 0.85493967]
mean value: 0.8492395591157109
key: test_jcc
value: [0.84848485 0.85294118 0.69230769 0.80555556 0.82857143 0.77777778
0.75 0.73529412 0.72972973 0.68571429]
mean value: 0.7706376612258965
key: train_jcc
value: [0.81935484 0.82165605 0.85016287 0.82108626 0.84142395 0.83012821
0.82484076 0.82692308 0.84090909 0.83171521]
mean value: 0.8308200313963069
MCC on Blind test: 0.2
Accuracy on Blind test: 0.45
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01428485 0.0115571 0.0116024 0.01192141 0.0118351 0.01425028
0.01206684 0.01400876 0.01199913 0.01239347]
mean value: 0.012591934204101563
key: score_time
value: [0.00854349 0.00847244 0.00854373 0.00855494 0.00946355 0.00848031
0.00859547 0.00851321 0.0085175 0.00893998]
mean value: 0.00866246223449707
key: test_mcc
value: [0.8566725 0.71206211 0.50611184 0.76032282 0.66337469 0.6139232
0.65994312 0.64852426 0.38733878 0.72168784]
mean value: 0.6529961162737778
key: train_mcc
value: [0.69022744 0.66164278 0.68466145 0.65612626 0.66739922 0.67302425
0.67350891 0.66972224 0.68052658 0.67334868]
mean value: 0.6730187805126882
key: test_accuracy
value: [0.93617021 0.87234043 0.78723404 0.89361702 0.85106383 0.82978723
0.85106383 0.84782609 0.73913043 0.86956522]
mean value: 0.8477798334875115
key: train_accuracy
value: [0.86428571 0.85238095 0.86190476 0.85 0.8547619 0.85714286
0.85714286 0.85510689 0.85985748 0.85748219]
mean value: 0.8570065603438525
key: test_fscore
value: [0.95238095 0.90909091 0.84848485 0.92307692 0.89552239 0.88235294
0.89230769 0.88888889 0.8125 0.90909091]
mean value: 0.8913696452557295
key: train_fscore
value: [0.90289608 0.89419795 0.90136054 0.89303905 0.89608177 0.89761092
0.89830508 0.89678511 0.89948893 0.89795918]
mean value: 0.8977724625814629
key: test_precision
value: [0.9375 0.85714286 0.8 0.88235294 0.83333333 0.81081081
0.85294118 0.875 0.78787879 0.83333333]
mean value: 0.8470293240146182
key: train_precision
value: [0.85760518 0.85064935 0.85483871 0.84565916 0.85113269 0.8538961
0.84935897 0.84664537 0.85436893 0.85436893]
mean value: 0.8518523398136467
key: test_recall
value: [0.96774194 0.96774194 0.90322581 0.96774194 0.96774194 0.96774194
0.93548387 0.90322581 0.83870968 1. ]
mean value: 0.9419354838709677
key: train_recall
value: [0.95323741 0.94244604 0.95323741 0.94604317 0.94604317 0.94604317
0.95323741 0.95323741 0.94964029 0.94623656]
mean value: 0.9489402026765684
key: test_roc_auc
value: [0.92137097 0.82762097 0.7328629 0.85887097 0.79637097 0.76512097
0.81149194 0.81827957 0.68602151 0.8125 ]
mean value: 0.8030510752688172
key: train_roc_auc
value: [0.82168913 0.80925119 0.818168 0.8040075 0.81104975 0.81457088
0.81112575 0.80878654 0.81747749 0.81466758]
mean value: 0.813079379384182
key: test_jcc
value: [0.90909091 0.83333333 0.73684211 0.85714286 0.81081081 0.78947368
0.80555556 0.8 0.68421053 0.83333333]
mean value: 0.8059793115056273
key: train_jcc
value: [0.82298137 0.80864198 0.82043344 0.80674847 0.8117284 0.81424149
0.81538462 0.81288344 0.81733746 0.81481481]
mean value: 0.8145195452770848
MCC on Blind test: 0.25
Accuracy on Blind test: 0.45
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.31461072 1.40823627 1.28151274 1.4231658 1.334095 1.30517697
1.41587329 1.28833318 1.49593544 1.34908724]
mean value: 1.3616026639938354
key: score_time
value: [0.01176286 0.01351857 0.0135088 0.01388788 0.01229548 0.01362157
0.01102948 0.01351404 0.01373792 0.01853848]
mean value: 0.013541507720947265
key: test_mcc
value: [1. 0.8084425 0.90662544 1. 0.95299692 0.76032282
0.90524194 0.90107527 0.74930844 0.80833333]
mean value: 0.8792346661083966
key: train_mcc
value: [0.9680267 0.95736701 0.94674008 0.9680267 0.96269263 0.9680267
0.9628398 0.96296053 0.95222181 0.99470992]
mean value: 0.9643611879690016
key: test_accuracy
value: [1. 0.91489362 0.95744681 1. 0.9787234 0.89361702
0.95744681 0.95652174 0.89130435 0.91304348]
mean value: 0.9462997224791859
key: train_accuracy
value: [0.98571429 0.98095238 0.97619048 0.98571429 0.98333333 0.98571429
0.98333333 0.98337292 0.97862233 0.9976247 ]
mean value: 0.9840572333446442
key: test_fscore
value: [1. 0.9375 0.96875 1. 0.98412698 0.92307692
0.96774194 0.96774194 0.92063492 0.93333333]
mean value: 0.9602906032139903
key: train_fscore
value: [0.98924731 0.98571429 0.98220641 0.98924731 0.98747764 0.98924731
0.98738739 0.98752228 0.98389982 0.99820467]
mean value: 0.9880154423532531
key: test_precision
value: [1. 0.90909091 0.93939394 1. 0.96875 0.88235294
0.96774194 0.96774194 0.90625 0.93333333]
mean value: 0.9474654993962395
key: train_precision
value: [0.98571429 0.9787234 0.97183099 0.98571429 0.98220641 0.98571429
0.98916968 0.97879859 0.97864769 1. ]
mean value: 0.9836519601503051
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
0.96774194 0.96774194 0.93548387 0.93333333]
mean value: 0.9739784946236559
key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
0.98561151 0.99640288 0.98920863 0.99641577]
mean value: 0.9924473324566154
key: test_roc_auc
value: [1. 0.89012097 0.9375 1. 0.96875 0.85887097
0.95262097 0.95053763 0.86774194 0.90416667]
mean value: 0.9330309139784947
key: train_roc_auc
value: [0.98231837 0.97527612 0.96823386 0.98231837 0.97879724 0.98231837
0.98224238 0.97722242 0.9736253 0.99820789]
mean value: 0.980056031046588
key: test_jcc
value: [1. 0.88235294 0.93939394 1. 0.96875 0.85714286
0.9375 0.9375 0.85294118 0.875 ]
mean value: 0.9250580914183856
key: train_jcc
value: [0.9787234 0.97183099 0.96503497 0.9787234 0.97526502 0.9787234
0.97508897 0.97535211 0.96830986 0.99641577]
mean value: 0.9763467891796095
MCC on Blind test: 0.13
Accuracy on Blind test: 0.31
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01342225 0.01069498 0.00975204 0.01033854 0.00994968 0.01040697
0.01035452 0.01077914 0.01066208 0.01090336]
mean value: 0.010726356506347656
key: score_time
value: [0.01061678 0.00818062 0.00800824 0.00842381 0.00850368 0.00858855
0.00848293 0.00844717 0.00845146 0.00849843]
mean value: 0.008620166778564453
key: test_mcc
value: [0.95299692 0.8566725 0.91188882 1. 0.86091836 0.8566725
0.87213027 0.95250095 0.90107527 0.80833333]
mean value: 0.8973188916801316
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9787234 0.93617021 0.95744681 1. 0.93617021 0.93617021
0.93617021 0.97826087 0.95652174 0.91304348]
mean value: 0.9528677150786309
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98412698 0.95238095 0.96666667 1. 0.95081967 0.95238095
0.94915254 0.98360656 0.96774194 0.93333333]
mean value: 0.9640209596253838
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96875 0.9375 1. 1. 0.96666667 0.9375
1. 1. 0.96774194 0.93333333]
mean value: 0.9711491935483871
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 0.93548387 1. 0.93548387 0.96774194
0.90322581 0.96774194 0.96774194 0.93333333]
mean value: 0.9578494623655914
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96875 0.92137097 0.96774194 1. 0.93649194 0.92137097
0.9516129 0.98387097 0.95053763 0.90416667]
mean value: 0.9505913978494623
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96875 0.90909091 0.93548387 1. 0.90625 0.90909091
0.90322581 0.96774194 0.9375 0.875 ]
mean value: 0.9312133431085043
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.2
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10349464 0.09808111 0.10309243 0.10495615 0.10384583 0.10514021
0.10287976 0.10301304 0.10425162 0.10234761]
mean value: 0.10311024188995362
key: score_time
value: [0.01685739 0.01713133 0.01867747 0.01792812 0.01854682 0.01873803
0.01870346 0.01733375 0.01832008 0.01786637]
mean value: 0.018010282516479494
key: test_mcc
value: [0.90662544 0.8084425 0.81503725 0.90662544 0.86070252 0.76032282
0.81048387 0.85009261 0.8059304 0.90571105]
mean value: 0.8429973908395795
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95744681 0.91489362 0.91489362 0.95744681 0.93617021 0.89361702
0.91489362 0.93478261 0.91304348 0.95652174]
mean value: 0.9293709528214616
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96875 0.9375 0.93939394 0.96875 0.95384615 0.92307692
0.93548387 0.95238095 0.93939394 0.96774194]
mean value: 0.9486317714543521
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93939394 0.90909091 0.88571429 0.93939394 0.91176471 0.88235294
0.93548387 0.9375 0.88571429 0.9375 ]
mean value: 0.9163908877333925
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
0.93548387 0.96774194 1. 1. ]
mean value: 0.9838709677419355
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.89012097 0.875 0.9375 0.90625 0.85887097
0.90524194 0.9172043 0.86666667 0.9375 ]
mean value: 0.9031854838709678
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93939394 0.88235294 0.88571429 0.93939394 0.91176471 0.85714286
0.87878788 0.90909091 0.88571429 0.9375 ]
mean value: 0.9026855742296919
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.36
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00836992 0.00826001 0.00835204 0.00824237 0.00821042 0.00798821
0.00828552 0.00832677 0.00851941 0.00838804]
mean value: 0.008294272422790527
key: score_time
value: [0.00871825 0.00869298 0.00867295 0.0086019 0.00861168 0.00866127
0.0086937 0.00873017 0.0088346 0.0087533 ]
mean value: 0.008697080612182616
key: test_mcc
value: [0.86091836 0.71206211 0.65309894 0.81952077 0.8084425 0.65994312
0.50614703 0.60602162 0.44695591 0.72379255]
mean value: 0.6796902925193711
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93617021 0.87234043 0.82978723 0.91489362 0.91489362 0.85106383
0.76595745 0.80434783 0.76086957 0.86956522]
mean value: 0.8519888991674376
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95081967 0.90909091 0.86206897 0.93333333 0.9375 0.89230769
0.81355932 0.84210526 0.82539683 0.89655172]
mean value: 0.8862733707106872
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96666667 0.85714286 0.92592593 0.96551724 0.90909091 0.85294118
0.85714286 0.92307692 0.8125 0.92857143]
mean value: 0.8998575985467466
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.93548387 0.96774194 0.80645161 0.90322581 0.96774194 0.93548387
0.77419355 0.77419355 0.83870968 0.86666667]
mean value: 0.8769892473118279
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93649194 0.82762097 0.84072581 0.9203629 0.89012097 0.81149194
0.76209677 0.82043011 0.71935484 0.87083333]
mean value: 0.8399529569892473
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90625 0.83333333 0.75757576 0.875 0.88235294 0.80555556
0.68571429 0.72727273 0.7027027 0.8125 ]
mean value: 0.7988257303330832
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.38
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.29370928 1.2518785 1.24979663 1.24180865 1.26994014 1.25986075
1.2572484 1.2555747 1.23349094 1.23494911]
mean value: 1.2548257112503052
key: score_time
value: [0.09408879 0.09164119 0.08997083 0.09628367 0.09728193 0.1462996
0.09323502 0.08956718 0.08982635 0.08968997]
mean value: 0.09778845310211182
key: test_mcc
value: [1. 0.8566725 1. 1. 0.90662544 0.81503725
1. 0.95250095 0.95087679 0.85513419]
mean value: 0.9336847119207848
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93617021 1. 1. 0.95744681 0.91489362
1. 0.97826087 0.97826087 0.93478261]
mean value: 0.9699814986123959
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 1. 1. 0.96875 0.93939394
1. 0.98360656 0.98412698 0.95081967]
mean value: 0.9779078105410073
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 1. 1. 0.93939394 0.88571429
1. 1. 0.96875 0.93548387]
mean value: 0.9666842096075967
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 1. 1.
1. 0.96774194 1. 0.96666667]
mean value: 0.9902150537634409
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92137097 1. 1. 0.9375 0.875
1. 0.98387097 0.96666667 0.92083333]
mean value: 0.9605241935483871
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 1. 1. 0.93939394 0.88571429
1. 0.96774194 0.96875 0.90625 ]
mean value: 0.9576941069683005
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.18
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.75414562 0.86297989 0.94754958 0.91948628 0.92758203 1.00107074
0.93352938 0.92137861 0.88540673 0.90511346]
mean value: 1.0058242321014403
key: score_time
value: [0.23915219 0.2850039 0.25384307 0.23436403 0.24242306 0.2717557
0.25083756 0.22900653 0.23912811 0.27642059]
mean value: 0.2521934747695923
key: test_mcc
value: [1. 0.8084425 0.90662544 1. 0.90662544 0.81503725
1. 0.90107527 0.95087679 0.80651412]
mean value: 0.9095196821072326
key: train_mcc
value: [0.94694186 0.96278526 0.94694186 0.94694186 0.94694186 0.96278526
0.95221511 0.95793986 0.95769694 0.96282875]
mean value: 0.9544018630875426
key: test_accuracy
value: [1. 0.91489362 0.95744681 1. 0.95744681 0.91489362
1. 0.95652174 0.97826087 0.91304348]
mean value: 0.9592506938020352
key: train_accuracy
value: [0.97619048 0.98333333 0.97619048 0.97619048 0.97619048 0.98333333
0.97857143 0.98099762 0.98099762 0.98337292]
mean value: 0.9795368171021377
key: test_fscore
value: [1. 0.9375 0.96875 1. 0.96875 0.93939394
1. 0.96774194 0.98412698 0.93548387]
mean value: 0.9701746729972536
key: train_fscore
value: [0.9822695 0.98752228 0.9822695 0.9822695 0.9822695 0.98752228
0.98401421 0.9858156 0.98576512 0.98756661]
mean value: 0.9847284121907804
key: test_precision
value: [1. 0.90909091 0.93939394 1. 0.93939394 0.88571429
1. 0.96774194 0.96875 0.90625 ]
mean value: 0.9516335009076945
key: train_precision
value: [0.96853147 0.97879859 0.96853147 0.96853147 0.96853147 0.97879859
0.97192982 0.97202797 0.97535211 0.97887324]
mean value: 0.9729906195972802
key: test_recall
value: [1. 0.96774194 1. 1. 1. 1.
1. 0.96774194 1. 0.96666667]
mean value: 0.9902150537634409
key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99640288 0.99640288 0.99640288
0.99640288 1. 0.99640288 0.99641577]
mean value: 0.9967638792192053
key: test_roc_auc
value: [1. 0.89012097 0.9375 1. 0.9375 0.875
1. 0.95053763 0.96666667 0.88958333]
mean value: 0.9446908602150538
key: train_roc_auc
value: [0.9665113 0.97707468 0.9665113 0.9665113 0.9665113 0.97707468
0.97003242 0.97202797 0.97372591 0.97708112]
mean value: 0.9713061984493545
key: test_jcc
value: [1. 0.88235294 0.93939394 1. 0.93939394 0.88571429
1. 0.9375 0.96875 0.87878788]
mean value: 0.9431892984466514
key: train_jcc
value: [0.96515679 0.97535211 0.96515679 0.96515679 0.96515679 0.97535211
0.96853147 0.97202797 0.97192982 0.9754386 ]
mean value: 0.9699259264664533
MCC on Blind test: 0.08
Accuracy on Blind test: 0.19
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01817513 0.0076189 0.00755739 0.00759125 0.0075407 0.00757432
0.00772119 0.0076313 0.0076437 0.00751853]
mean value: 0.008657240867614746
key: score_time
value: [0.0108695 0.00802517 0.00807548 0.00796461 0.00793648 0.00793743
0.00874805 0.00799894 0.00802493 0.00804806]
mean value: 0.008362865447998047
key: test_mcc
value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232
0.66402366 0.59332241 0.38733878 0.70954337]
mean value: 0.6137610732708011
key: train_mcc
value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246
0.6506538 0.65794031 0.65846852 0.63442864]
mean value: 0.6459914516114823
key: test_accuracy
value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723
0.85106383 0.82608696 0.73913043 0.86956522]
mean value: 0.8307123034227567
key: train_accuracy
value: [0.83809524 0.8452381 0.85238095 0.84285714 0.84285714 0.84285714
0.84761905 0.85035629 0.85035629 0.84085511]
mean value: 0.8453472457866757
key: test_fscore
value: [0.91803279 0.875 0.78125 0.90909091 0.92307692 0.88235294
0.88888889 0.875 0.8125 0.90625 ]
mean value: 0.8771442449118437
key: train_fscore
value: [0.88316151 0.88773748 0.89007092 0.8862069 0.88581315 0.88581315
0.88965517 0.89156627 0.89081456 0.88468158]
mean value: 0.8875520685563664
key: test_precision
value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081
0.875 0.84848485 0.78787879 0.85294118]
mean value: 0.8454005361358302
key: train_precision
value: [0.84539474 0.8538206 0.87762238 0.85099338 0.85333333 0.85333333
0.85430464 0.85478548 0.85953177 0.85099338]
mean value: 0.8554113020989377
key: test_recall
value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194
0.90322581 0.90322581 0.83870968 0.96666667]
mean value: 0.9127956989247312
key: train_recall
value: [0.92446043 0.92446043 0.9028777 0.92446043 0.92086331 0.92086331
0.92805755 0.93165468 0.92446043 0.92114695]
mean value: 0.9223305226786314
key: test_roc_auc
value: [0.8891129 0.7953629 0.65322581 0.82762097 0.85887097 0.76512097
0.8266129 0.78494624 0.68602151 0.82708333]
mean value: 0.7913978494623656
key: train_roc_auc
value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208
0.8090992 0.81198118 0.81537707 0.80212277]
mean value: 0.8085601200017799
key: test_jcc
value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368
0.8 0.77777778 0.68421053 0.82857143]
mean value: 0.783779787463998
key: train_jcc
value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106
0.80124224 0.80434783 0.803125 0.79320988]
mean value: 0.7978475494770487
MCC on Blind test: 0.24
Accuracy on Blind test: 0.47
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08977652 0.0446949 0.05097675 0.05265212 0.04966545 0.04864025
0.2227385 0.04262686 0.0463593 0.04706073]
mean value: 0.06951913833618165
key: score_time
value: [0.00969934 0.00960755 0.00962806 0.0097065 0.00962687 0.01001763
0.01041269 0.01037621 0.0100019 0.01042318]
mean value: 0.009949994087219239
key: test_mcc
value: [1. 0.8566725 1. 1. 0.90524194 0.86070252
1. 0.95250095 0.95087679 0.85513419]
mean value: 0.9381128880260178
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93617021 1. 1. 0.95744681 0.93617021
1. 0.97826087 0.97826087 0.93478261]
mean value: 0.972109158186864
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 1. 1. 0.96774194 0.95384615
1. 0.98360656 0.98412698 0.95081967]
mean value: 0.9792522255346158
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 1. 1. 0.96774194 0.91176471
1. 1. 0.96875 0.93548387]
mean value: 0.9721240512333966
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 0.96774194 1.
1. 0.96774194 1. 0.96666667]
mean value: 0.986989247311828
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92137097 1. 1. 0.95262097 0.90625
1. 0.98387097 0.96666667 0.92083333]
mean value: 0.9651612903225807
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 1. 1. 0.9375 0.91176471
1. 0.96774194 0.96875 0.90625 ]
mean value: 0.9601097550457133
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.2
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01633263 0.01602507 0.03087282 0.03793883 0.03829098 0.03755164
0.03849244 0.04578662 0.03847647 0.0386765 ]
mean value: 0.033844399452209475
key: score_time
value: [0.01047325 0.01068068 0.02036643 0.01072168 0.01989603 0.02082086
0.02522516 0.01081634 0.0206635 0.02184916]
mean value: 0.017151308059692384
key: test_mcc
value: [0.95436677 0.8566725 1. 1. 0.90662544 0.81503725
1. 0.9085301 0.90107527 0.75776742]
mean value: 0.9100074758399945
key: train_mcc
value: [0.94131391 0.95204958 0.93598399 0.94131391 0.94674008 0.95734993
0.93066133 0.9469026 0.9469923 0.95754545]
mean value: 0.9456853089391832
key: test_accuracy
value: [0.9787234 0.93617021 1. 1. 0.95744681 0.91489362
1. 0.95652174 0.95652174 0.89130435]
mean value: 0.9591581868640148
key: train_accuracy
value: [0.97380952 0.97857143 0.97142857 0.97380952 0.97619048 0.98095238
0.96904762 0.97624703 0.97624703 0.98099762]
mean value: 0.9757301210270332
key: test_fscore
value: [0.98360656 0.95238095 1. 1. 0.96875 0.93939394
1. 0.96666667 0.96774194 0.91803279]
mean value: 0.9696572838187725
key: train_fscore
value: [0.98039216 0.98395722 0.97864769 0.98039216 0.98220641 0.98566308
0.97690941 0.98214286 0.98220641 0.9858156 ]
mean value: 0.9818332987468832
key: test_precision
value: [1. 0.9375 1. 1. 0.93939394 0.88571429
1. 1. 0.96774194 0.90322581]
mean value: 0.9633575967043709
key: train_precision
value: [0.97173145 0.97526502 0.96830986 0.97173145 0.97183099 0.98214286
0.96491228 0.9751773 0.97183099 0.9754386 ]
mean value: 0.972837078548064
key: test_recall
value: [0.96774194 0.96774194 1. 1. 1. 1.
1. 0.93548387 0.96774194 0.93333333]
mean value: 0.9772043010752688
key: train_recall
value: [0.98920863 0.99280576 0.98920863 0.98920863 0.99280576 0.98920863
0.98920863 0.98920863 0.99280576 0.99641577]
mean value: 0.991008483535752
key: test_roc_auc
value: [0.98387097 0.92137097 1. 1. 0.9375 0.875
1. 0.96774194 0.95053763 0.87291667]
mean value: 0.9508938172043011
key: train_roc_auc
value: [0.9664353 0.97175499 0.96291418 0.9664353 0.96823386 0.97699868
0.95939305 0.97012879 0.96843085 0.97356 ]
mean value: 0.9684285006076279
key: test_jcc
value: [0.96774194 0.90909091 1. 1. 0.93939394 0.88571429
1. 0.93548387 0.9375 0.84848485]
mean value: 0.9423409789135595
key: train_jcc
value: [0.96153846 0.96842105 0.95818815 0.96153846 0.96503497 0.97173145
0.95486111 0.96491228 0.96503497 0.97202797]
mean value: 0.9643288871692625
MCC on Blind test: 0.13
Accuracy on Blind test: 0.3
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.0186336 0.00766277 0.00761127 0.00744534 0.00747466 0.00750709
0.00777817 0.00826359 0.00808811 0.00809813]
mean value: 0.00885627269744873
key: score_time
value: [0.00869727 0.00830197 0.00809574 0.00786757 0.00828147 0.00783634
0.0086298 0.00836444 0.00861764 0.00868964]
mean value: 0.008338189125061036
key: test_mcc
value: [0.8566725 0.65994312 0.45918373 0.76032282 0.66337469 0.6139232
0.52620968 0.64852426 0.50537634 0.76764947]
mean value: 0.6461179816200634
key: train_mcc
value: [0.62766379 0.63945586 0.68424763 0.64471064 0.6504316 0.67304969
0.67293578 0.65214979 0.67466169 0.65101792]
mean value: 0.6570324374013666
key: test_accuracy
value: [0.93617021 0.85106383 0.76595745 0.89361702 0.85106383 0.82978723
0.78723404 0.84782609 0.7826087 0.89130435]
mean value: 0.8436632747456059
key: train_accuracy
value: [0.83809524 0.84285714 0.86190476 0.8452381 0.84761905 0.85714286
0.85714286 0.847981 0.85748219 0.847981 ]
mean value: 0.8503444180522566
key: test_fscore
value: [0.95238095 0.89230769 0.83076923 0.92307692 0.89552239 0.88235294
0.83870968 0.88888889 0.83870968 0.92307692]
mean value: 0.8865795294575493
key: train_fscore
value: [0.88356164 0.8862069 0.9 0.88850772 0.89003436 0.89655172
0.89726027 0.89041096 0.89726027 0.89003436]
mean value: 0.8919828218593321
key: test_precision
value: [0.9375 0.85294118 0.79411765 0.88235294 0.83333333 0.81081081
0.83870968 0.875 0.83870968 0.85714286]
mean value: 0.8520618120831593
key: train_precision
value: [0.84313725 0.85099338 0.86423841 0.84918033 0.85197368 0.86092715
0.85620915 0.8496732 0.85620915 0.85478548]
mean value: 0.8537327189194519
key: test_recall
value: [0.96774194 0.93548387 0.87096774 0.96774194 0.96774194 0.96774194
0.83870968 0.90322581 0.83870968 1. ]
mean value: 0.9258064516129032
key: train_recall
value: [0.92805755 0.92446043 0.93884892 0.93165468 0.93165468 0.9352518
0.94244604 0.9352518 0.94244604 0.92831541]
mean value: 0.9338387354632423
key: test_roc_auc
value: [0.92137097 0.81149194 0.71673387 0.85887097 0.79637097 0.76512097
0.76310484 0.81827957 0.75268817 0.84375 ]
mean value: 0.8047782258064516
key: train_roc_auc
value: [0.79501469 0.80377951 0.82505826 0.80385551 0.80737663 0.81973858
0.81629344 0.80678674 0.81737687 0.80922813]
mean value: 0.8104508362630897
key: test_jcc
value: [0.90909091 0.80555556 0.71052632 0.85714286 0.81081081 0.78947368
0.72222222 0.8 0.72222222 0.85714286]
mean value: 0.7984187434187434
key: train_jcc
value: [0.79141104 0.79566563 0.81818182 0.79938272 0.80185759 0.8125
0.8136646 0.80246914 0.8136646 0.80185759]
mean value: 0.80506547104786
MCC on Blind test: 0.21
Accuracy on Blind test: 0.45
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00986862 0.01313043 0.0124433 0.01303792 0.01351666 0.01400781
0.01293039 0.01448417 0.01343918 0.01248717]
mean value: 0.012934565544128418
key: score_time
value: [0.00865817 0.00993657 0.0099678 0.01048064 0.01074982 0.0105195
0.01045227 0.01052117 0.01057601 0.01054454]
mean value: 0.010240650177001953
key: test_mcc
value: [1. 0.8566725 1. 0.95436677 0.90662544 0.81503725
0.90524194 0.9085301 0.7725558 0.85513419]
mean value: 0.8974163989404769
key: train_mcc
value: [0.93593571 0.9627116 0.92552437 0.92120646 0.92557595 0.85221677
0.93598399 0.94195411 0.93206488 0.89469123]
mean value: 0.9227865066682192
key: test_accuracy
value: [1. 0.93617021 1. 0.9787234 0.95744681 0.91489362
0.95744681 0.95652174 0.89130435 0.93478261]
mean value: 0.9527289546716003
key: train_accuracy
value: [0.97142857 0.98333333 0.96666667 0.96428571 0.96666667 0.93333333
0.97142857 0.97387173 0.96912114 0.95249406]
mean value: 0.965262979300984
key: test_fscore
value: [1. 0.95238095 1. 0.98360656 0.96875 0.93939394
0.96774194 0.96666667 0.91525424 0.95081967]
mean value: 0.9644613960721762
key: train_fscore
value: [0.97857143 0.98743268 0.97482014 0.97277677 0.97526502 0.95172414
0.97864769 0.98053097 0.97640653 0.96527778]
mean value: 0.9741453144247227
key: test_precision
value: [1. 0.9375 1. 1. 0.93939394 0.88571429
0.96774194 1. 0.96428571 0.93548387]
mean value: 0.9630119745845552
key: train_precision
value: [0.97163121 0.98566308 0.97482014 0.98168498 0.95833333 0.91390728
0.96830986 0.96515679 0.98534799 0.93602694]
mean value: 0.9640881606737391
key: test_recall
value: [1. 0.96774194 1. 0.96774194 1. 1.
0.96774194 0.93548387 0.87096774 0.96666667]
mean value: 0.9676344086021506
key: train_recall
value: [0.98561151 0.98920863 0.97482014 0.96402878 0.99280576 0.99280576
0.98920863 0.99640288 0.9676259 0.99641577]
mean value: 0.984893375622083
key: test_roc_auc
value: [1. 0.92137097 1. 0.98387097 0.9375 0.875
0.95262097 0.96774194 0.90215054 0.92083333]
mean value: 0.946108870967742
key: train_roc_auc
value: [0.96463674 0.98051981 0.96276218 0.96440875 0.95414936 0.90485358
0.96291418 0.9632364 0.96982694 0.93130648]
mean value: 0.9558614420708662
key: test_jcc
value: [1. 0.90909091 1. 0.96774194 0.93939394 0.88571429
0.9375 0.93548387 0.84375 0.90625 ]
mean value: 0.9324924940650747
key: train_jcc
value: [0.95804196 0.9751773 0.95087719 0.94699647 0.95172414 0.90789474
0.95818815 0.96180556 0.95390071 0.93288591]
mean value: 0.9497492121318976
MCC on Blind test: 0.09
Accuracy on Blind test: 0.27
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0119803 0.012357 0.01353669 0.01262236 0.01239181 0.01188588
0.01398087 0.01292968 0.01270461 0.01215243]
mean value: 0.01265416145324707
key: score_time
value: [0.01043653 0.01049995 0.01050258 0.0104773 0.01051712 0.01048827
0.0106318 0.01071954 0.01067996 0.01075029]
mean value: 0.010570335388183593
key: test_mcc
value: [1. 0.8084425 0.87213027 0.95299692 0.90662544 0.78063446
0.95299692 0.85009261 0.81245565 0.76471368]
mean value: 0.8701088462869901
key: train_mcc
value: [0.93057824 0.96269263 0.86379539 0.93066133 0.86786568 0.85610492
0.94674008 0.88991881 0.94166847 0.91286344]
mean value: 0.9102888984394315
key: test_accuracy
value: [1. 0.91489362 0.93617021 0.9787234 0.95744681 0.89361702
0.9787234 0.93478261 0.91304348 0.89130435]
mean value: 0.9398704902867715
key: train_accuracy
value: [0.96904762 0.98333333 0.93571429 0.96904762 0.94047619 0.93095238
0.97619048 0.95011876 0.97387173 0.95961995]
mean value: 0.9588372356068318
key: test_fscore
value: [1. 0.9375 0.94915254 0.98412698 0.96875 0.91525424
0.98412698 0.95238095 0.93333333 0.91525424]
mean value: 0.9539879270917406
key: train_fscore
value: [0.97682709 0.98747764 0.94990724 0.97690941 0.95667244 0.94579439
0.98220641 0.96347826 0.98025135 0.96892139]
mean value: 0.9688445621247324
key: test_precision
value: [1. 0.90909091 1. 0.96875 0.93939394 0.96428571
0.96875 0.9375 0.96551724 0.93103448]
mean value: 0.9584322286908494
key: train_precision
value: [0.96819788 0.98220641 0.98084291 0.96491228 0.92307692 0.9844358
0.97183099 0.93265993 0.97849462 0.98880597]
mean value: 0.9675463711254643
key: test_recall
value: [1. 0.96774194 0.90322581 1. 1. 0.87096774
1. 0.96774194 0.90322581 0.9 ]
mean value: 0.9512903225806452
key: train_recall
value: [0.98561151 0.99280576 0.92086331 0.98920863 0.99280576 0.91007194
0.99280576 0.99640288 0.98201439 0.94982079]
mean value: 0.971241071658802
key: test_roc_auc
value: [1. 0.89012097 0.9516129 0.96875 0.9375 0.90423387
0.96875 0.9172043 0.91827957 0.8875 ]
mean value: 0.9343951612903226
key: train_roc_auc
value: [0.96111561 0.97879724 0.94282602 0.95939305 0.91541696 0.94095146
0.96823386 0.92827137 0.97002817 0.96434701]
mean value: 0.9529380774427173
key: test_jcc
value: [1. 0.88235294 0.90322581 0.96875 0.93939394 0.84375
0.96875 0.90909091 0.875 0.84375 ]
mean value: 0.9134063596112932
key: train_jcc
value: [0.95470383 0.97526502 0.90459364 0.95486111 0.91694352 0.89716312
0.96503497 0.9295302 0.96126761 0.93971631]
mean value: 0.9399079327337388
MCC on Blind test: 0.07
Accuracy on Blind test: 0.21
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.1008575 0.08776975 0.08655286 0.0874176 0.08782935 0.08918238
0.09195852 0.09141636 0.09183121 0.08885765]
mean value: 0.09036731719970703
key: score_time
value: [0.01442814 0.0153048 0.01412559 0.01522112 0.01434422 0.0145371
0.01523519 0.01551008 0.0142715 0.01540041]
mean value: 0.014837813377380372
key: test_mcc
value: [0.90524194 0.8566725 0.95436677 1. 0.90662544 0.81503725
0.95436677 0.95250095 0.95087679 0.75806977]
mean value: 0.9053758183184529
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95744681 0.93617021 0.9787234 1. 0.95744681 0.91489362
0.9787234 0.97826087 0.97826087 0.89130435]
mean value: 0.9571230342275671
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96774194 0.95238095 0.98360656 1. 0.96875 0.93939394
0.98360656 0.98360656 0.98412698 0.92063492]
mean value: 0.9683848404151815
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96774194 0.9375 1. 1. 0.93939394 0.88571429
1. 1. 0.96875 0.87878788]
mean value: 0.9577888039379975
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 0.96774194 0.96774194 1. 1. 1.
0.96774194 0.96774194 1. 0.96666667]
mean value: 0.9805376344086022
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95262097 0.92137097 0.98387097 1. 0.9375 0.875
0.98387097 0.98387097 0.96666667 0.85833333]
mean value: 0.9463104838709677
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9375 0.90909091 0.96774194 1. 0.93939394 0.88571429
0.96774194 0.96774194 0.96875 0.85294118]
mean value: 0.9396616117121336
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.19
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03681731 0.03343534 0.04631495 0.04816437 0.05419993 0.04136348
0.03020048 0.03114557 0.05245137 0.04301977]
mean value: 0.04171125888824463
key: score_time
value: [0.02169442 0.01837158 0.02844691 0.01603293 0.03581977 0.02635193
0.0178473 0.01740122 0.02229071 0.01603532]
mean value: 0.02202920913696289
key: test_mcc
value: [0.95299692 0.8566725 1. 1. 0.8566725 0.81503725
0.91188882 0.95250095 0.95087679 0.85927505]
mean value: 0.9155920774240871
key: train_mcc
value: [0.97879832 1. 0.99468526 0.98945277 0.98408467 0.99468526
0.98945277 0.99472781 0.98940987 0.98946562]
mean value: 0.9904762341887853
key: test_accuracy
value: [0.9787234 0.93617021 1. 1. 0.93617021 0.91489362
0.95744681 0.97826087 0.97826087 0.93478261]
mean value: 0.9614708603145236
key: train_accuracy
value: [0.99047619 1. 0.99761905 0.9952381 0.99285714 0.99761905
0.9952381 0.9976247 0.99524941 0.99524941]
mean value: 0.995717113448705
key: test_fscore
value: [0.98412698 0.95238095 1. 1. 0.95238095 0.93939394
0.96666667 0.98360656 0.98412698 0.94915254]
mean value: 0.9711835578826409
key: train_fscore
value: [0.99285714 1. 0.99820467 0.99638989 0.99463327 0.99820467
0.99638989 0.9981982 0.99640288 0.99640288]
mean value: 0.9967683489274677
key: test_precision
value: [0.96875 0.9375 1. 1. 0.9375 0.88571429
1. 1. 0.96875 0.96551724]
mean value: 0.9663731527093596
key: train_precision
value: [0.9858156 1. 0.99641577 1. 0.98932384 0.99641577
1. 1. 0.99640288 1. ]
mean value: 0.9964373865169729
key: test_recall
value: [1. 0.96774194 1. 1. 0.96774194 1.
0.93548387 0.96774194 1. 0.93333333]
mean value: 0.9772043010752688
key: train_recall
value: [1. 1. 1. 0.99280576 1. 1.
0.99280576 0.99640288 0.99640288 0.99283154]
mean value: 0.9971248807405688
key: test_roc_auc
value: [0.96875 0.92137097 1. 1. 0.92137097 0.875
0.96774194 0.98387097 0.96666667 0.93541667]
mean value: 0.954018817204301
key: train_roc_auc
value: [0.98591549 1. 0.99647887 0.99640288 0.98943662 0.99647887
0.99640288 0.99820144 0.99470494 0.99641577]
mean value: 0.995043775936127
key: test_jcc
value: [0.96875 0.90909091 1. 1. 0.90909091 0.88571429
0.93548387 0.96774194 0.96875 0.90322581]
mean value: 0.9447847716799329
key: train_jcc
value: [0.9858156 1. 0.99641577 0.99280576 0.98932384 0.99641577
0.99280576 0.99640288 0.99283154 0.99283154]
mean value: 0.9935648458398372
MCC on Blind test: 0.07
Accuracy on Blind test: 0.19
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.06595278 0.07601976 0.07935166 0.12056971 0.07869649 0.06834197
0.13297486 0.15896058 0.14390469 0.13192463]
mean value: 0.10566971302032471
key: score_time
value: [0.01232171 0.01868176 0.01198316 0.01882172 0.01206756 0.01199055
0.01884794 0.02548599 0.02592111 0.0252378 ]
mean value: 0.018135929107666017
key: test_mcc
value: [0.90662544 0.60908698 0.4512753 0.65994312 0.71206211 0.6139232
0.66402366 0.59332241 0.43161973 0.76764947]
mean value: 0.6409531430663058
key: train_mcc
value: [0.80273059 0.7991351 0.79087061 0.79295441 0.78611575 0.79743374
0.78683895 0.80017613 0.80374289 0.79643548]
mean value: 0.7956433649163105
key: test_accuracy
value: [0.95744681 0.82978723 0.76595745 0.85106383 0.87234043 0.82978723
0.85106383 0.82608696 0.76086957 0.89130435]
mean value: 0.8435707678075856
key: train_accuracy
value: [0.91190476 0.90952381 0.90714286 0.90714286 0.9047619 0.90952381
0.9047619 0.90973872 0.91211401 0.90973872]
mean value: 0.9086353353693021
key: test_fscore
value: [0.96875 0.87878788 0.8358209 0.89230769 0.90909091 0.88235294
0.88888889 0.875 0.83076923 0.92307692]
mean value: 0.8884845359620381
key: train_fscore
value: [0.93653516 0.93537415 0.93287435 0.93356048 0.93150685 0.93493151
0.93174061 0.93537415 0.93653516 0.9347079 ]
mean value: 0.9343140331061971
key: test_precision
value: [0.93939394 0.82857143 0.77777778 0.85294118 0.85714286 0.81081081
0.875 0.84848485 0.79411765 0.85714286]
mean value: 0.8441383342853931
key: train_precision
value: [0.89508197 0.88709677 0.89438944 0.88673139 0.88888889 0.89215686
0.88636364 0.88709677 0.89508197 0.89768977]
mean value: 0.8910577470317502
key: test_recall
value: [1. 0.93548387 0.90322581 0.93548387 0.96774194 0.96774194
0.90322581 0.90322581 0.87096774 1. ]
mean value: 0.9387096774193548
key: train_recall
value: [0.98201439 0.98920863 0.97482014 0.98561151 0.97841727 0.98201439
0.98201439 0.98920863 0.98201439 0.97491039]
mean value: 0.9820234135272428
key: test_roc_auc
value: [0.9375 0.78024194 0.7016129 0.81149194 0.82762097 0.76512097
0.8266129 0.78494624 0.70215054 0.84375 ]
mean value: 0.7981048387096774
key: train_roc_auc
value: [0.87833114 0.87136488 0.87473402 0.86956632 0.86949032 0.87481001
0.86776776 0.87222669 0.87911908 0.87830027]
mean value: 0.8735710488300057
key: test_jcc
value: [0.93939394 0.78378378 0.71794872 0.80555556 0.83333333 0.78947368
0.8 0.77777778 0.71052632 0.85714286]
mean value: 0.8014935964935965
key: train_jcc
value: [0.88064516 0.87859425 0.87419355 0.87539936 0.87179487 0.8778135
0.87220447 0.87859425 0.88064516 0.87741935]
mean value: 0.8767303934692845
MCC on Blind test: 0.21
Accuracy on Blind test: 0.42
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.2230103 0.21394682 0.20187664 0.20994234 0.20898438 0.21629405
0.21086693 0.20910215 0.21160555 0.20683503]
mean value: 0.21124641895294188
key: score_time
value: [0.00933719 0.00840378 0.00872827 0.00917697 0.00930619 0.00924182
0.00842547 0.00914001 0.00950432 0.00904679]
mean value: 0.009031081199645996
key: test_mcc
value: [1. 0.8566725 1. 1. 0.95299692 0.81503725
1. 0.95250095 0.95087679 0.80833333]
mean value: 0.9336417737001077
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93617021 1. 1. 0.9787234 0.91489362
1. 0.97826087 0.97826087 0.91304348]
mean value: 0.9699352451433858
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 1. 1. 0.98412698 0.93939394
1. 0.98360656 0.98412698 0.93333333]
mean value: 0.9776968750739242
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 1. 1. 0.96875 0.88571429
1. 1. 0.96875 0.93333333]
mean value: 0.9694047619047619
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 1. 1.
1. 0.96774194 1. 0.93333333]
mean value: 0.9868817204301076
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92137097 1. 1. 0.96875 0.875
1. 0.98387097 0.96666667 0.90416667]
mean value: 0.9619825268817205
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 1. 1. 0.96875 0.88571429
1. 0.96774194 0.96875 0.875 ]
mean value: 0.9575047130289066
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.19
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.0117166 0.01313019 0.01318526 0.01325989 0.01305079 0.01312709
0.01312232 0.01324248 0.01329851 0.0137887 ]
mean value: 0.01309218406677246
key: score_time
value: [0.0111506 0.01089978 0.01084971 0.0108676 0.01087546 0.01084447
0.01105189 0.01162434 0.01162648 0.01165462]
mean value: 0.011144495010375977
key: test_mcc
value: [0.46502704 0.68913865 0.66402366 0.71206211 0.6139232 0.67402153
0.62096774 0.74844698 0.44695591 0.53674504]
mean value: 0.6171311872005444
key: train_mcc
value: [0.6778431 0.7128472 0.85474068 0.79307454 0.73273261 0.88954988
0.79770673 0.82923345 0.77993671 0.88249782]
mean value: 0.7950162701330918
key: test_accuracy
value: [0.70212766 0.85106383 0.85106383 0.87234043 0.82978723 0.85106383
0.82978723 0.89130435 0.76086957 0.7826087 ]
mean value: 0.8222016651248844
key: train_accuracy
value: [0.82142857 0.8452381 0.93333333 0.9047619 0.88095238 0.95
0.90714286 0.9239905 0.90261283 0.94536817]
mean value: 0.9014828639294198
key: test_fscore
value: [0.73076923 0.88135593 0.88888889 0.90909091 0.88235294 0.8852459
0.87096774 0.92307692 0.82539683 0.82758621]
mean value: 0.8624731501074018
key: train_fscore
value: [0.84662577 0.86973948 0.94871795 0.92647059 0.91582492 0.96188748
0.92844037 0.94425087 0.92794376 0.95779817]
mean value: 0.9227699340095629
key: test_precision
value: [0.9047619 0.92857143 0.875 0.85714286 0.81081081 0.9
0.87096774 0.88235294 0.8125 0.85714286]
mean value: 0.8699250541541813
key: train_precision
value: [0.98104265 0.98190045 0.96641791 0.94736842 0.86075949 0.97069597
0.94756554 0.91554054 0.90721649 0.98120301]
mean value: 0.9459710488360232
key: test_recall
value: [0.61290323 0.83870968 0.90322581 0.96774194 0.96774194 0.87096774
0.87096774 0.96774194 0.83870968 0.8 ]
mean value: 0.8638709677419355
key: train_recall
value: [0.74460432 0.78057554 0.93165468 0.90647482 0.97841727 0.95323741
0.91007194 0.97482014 0.94964029 0.93548387]
mean value: 0.906498027384544
key: test_roc_auc
value: [0.74395161 0.85685484 0.8266129 0.82762097 0.76512097 0.84173387
0.81048387 0.85053763 0.71935484 0.775 ]
mean value: 0.8017271505376344
key: train_roc_auc
value: [0.85821765 0.87620326 0.9341372 0.90394164 0.83427906 0.94844969
0.9057402 0.89999748 0.88041455 0.9501363 ]
mean value: 0.8991517025527074
key: test_jcc
value: [0.57575758 0.78787879 0.8 0.83333333 0.78947368 0.79411765
0.77142857 0.85714286 0.7027027 0.70588235]
mean value: 0.7617717512454355
key: train_jcc
value: [0.73404255 0.76950355 0.90243902 0.8630137 0.8447205 0.92657343
0.86643836 0.89438944 0.86557377 0.91901408]
mean value: 0.8585708395886121
MCC on Blind test: 0.13
Accuracy on Blind test: 0.63
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02074528 0.02051353 0.01858282 0.02953506 0.02967763 0.0314672
0.02946687 0.02941847 0.02958179 0.02945447]
mean value: 0.026844310760498046
key: score_time
value: [0.02140307 0.01061296 0.01083541 0.02066278 0.0109098 0.01821399
0.02039957 0.01891303 0.02117467 0.01968718]
mean value: 0.017281246185302735
key: test_mcc
value: [0.95299692 0.8084425 0.8566725 0.95299692 0.90662544 0.76032282
0.90662544 0.80215054 0.75776742 0.85513419]
mean value: 0.8559734697377736
key: train_mcc
value: [0.92003671 0.92030205 0.87684521 0.89326029 0.93085643 0.90414739
0.88770942 0.9151442 0.88322214 0.90932054]
mean value: 0.9040844381960059
key: test_accuracy
value: [0.9787234 0.91489362 0.93617021 0.9787234 0.95744681 0.89361702
0.95744681 0.91304348 0.89130435 0.93478261]
mean value: 0.9356151711378353
key: train_accuracy
value: [0.96428571 0.96428571 0.9452381 0.95238095 0.96904762 0.95714286
0.95 0.96199525 0.94774347 0.95961995]
mean value: 0.9571739622214681
key: test_fscore
value: [0.98412698 0.9375 0.95238095 0.98412698 0.96875 0.92307692
0.96875 0.93548387 0.91803279 0.95081967]
mean value: 0.9523048173695978
key: train_fscore
value: [0.97345133 0.97354497 0.95943563 0.96478873 0.97699115 0.96830986
0.96296296 0.97173145 0.96140351 0.97001764]
mean value: 0.9682637226255115
key: test_precision
value: [0.96875 0.90909091 0.9375 0.96875 0.93939394 0.88235294
0.93939394 0.93548387 0.93333333 0.93548387]
mean value: 0.9349532804324076
key: train_precision
value: [0.95818815 0.9550173 0.94117647 0.94482759 0.96167247 0.94827586
0.94463668 0.95486111 0.93835616 0.95486111]
mean value: 0.9501872911886335
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.96774194
1. 0.93548387 0.90322581 0.96666667]
mean value: 0.9708602150537634
key: train_recall
value: [0.98920863 0.99280576 0.97841727 0.98561151 0.99280576 0.98920863
0.98201439 0.98920863 0.98561151 0.98566308]
mean value: 0.9870555168768211
key: test_roc_auc
value: [0.96875 0.89012097 0.92137097 0.96875 0.9375 0.85887097
0.9375 0.90107527 0.88494624 0.92083333]
mean value: 0.9189717741935484
key: train_roc_auc
value: [0.9523508 0.95062823 0.92934948 0.93646773 0.95767048 0.94178742
0.93466917 0.94914977 0.92986869 0.94705689]
mean value: 0.9428998652048836
key: test_jcc
value: [0.96875 0.88235294 0.90909091 0.96875 0.93939394 0.85714286
0.93939394 0.87878788 0.84848485 0.90625 ]
mean value: 0.9098397313470843
key: train_jcc
value: [0.94827586 0.94845361 0.9220339 0.93197279 0.9550173 0.93856655
0.92857143 0.94501718 0.92567568 0.94178082]
mean value: 0.9385365119971703
MCC on Blind test: 0.18
Accuracy on Blind test: 0.4
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.20578527 0.09308887 0.1543951 0.2597549 0.22493172 0.19128704
0.15882158 0.10618186 0.18721414 0.18994951]
mean value: 0.17714099884033202
key: score_time
value: [0.01115108 0.01121378 0.0221827 0.02100086 0.02120638 0.02156854
0.01102662 0.0211966 0.01679158 0.01401901]
mean value: 0.01713571548461914
key: test_mcc
value: [1. 0.8566725 1. 0.95299692 0.90662544 0.81503725
0.95299692 0.9085301 0.90107527 0.85513419]
mean value: 0.914906858369952
key: train_mcc
value: [0.92522791 0.94131391 0.91988445 0.92534566 0.93598399 0.94131391
0.93066133 0.94171645 0.93099139 0.94680199]
mean value: 0.9339241011569926
key: test_accuracy
value: [1. 0.93617021 1. 0.9787234 0.95744681 0.91489362
0.9787234 0.95652174 0.95652174 0.93478261]
mean value: 0.9613783533765032
key: train_accuracy
value: [0.96666667 0.97380952 0.96428571 0.96666667 0.97142857 0.97380952
0.96904762 0.97387173 0.96912114 0.97624703]
mean value: 0.970495419070241
key: test_fscore
value: [1. 0.95238095 1. 0.98412698 0.96875 0.93939394
0.98412698 0.96666667 0.96774194 0.95081967]
mean value: 0.9714007134310545
key: train_fscore
value: [0.97508897 0.98039216 0.97335702 0.9751773 0.97864769 0.98039216
0.97690941 0.98046181 0.97690941 0.9822695 ]
mean value: 0.9779605432457805
key: test_precision
value: [1. 0.9375 1. 0.96875 0.93939394 0.88571429
0.96875 1. 0.96774194 0.93548387]
mean value: 0.9603334031559838
key: train_precision
value: [0.96478873 0.97173145 0.96140351 0.96153846 0.96830986 0.97173145
0.96491228 0.96842105 0.96491228 0.97192982]
mean value: 0.9669678897982681
key: test_recall
value: [1. 0.96774194 1. 1. 1. 1.
1. 0.93548387 0.96774194 0.96666667]
mean value: 0.983763440860215
key: train_recall
value: [0.98561151 0.98920863 0.98561151 0.98920863 0.98920863 0.98920863
0.98920863 0.99280576 0.98920863 0.99283154]
mean value: 0.9892112116758206
key: test_roc_auc
value: [1. 0.92137097 1. 0.96875 0.9375 0.875
0.96875 0.96774194 0.95053763 0.92083333]
mean value: 0.9510483870967742
key: train_roc_auc
value: [0.95759449 0.9664353 0.95407336 0.95587192 0.96291418 0.9664353
0.95939305 0.96493435 0.95963928 0.96824676]
mean value: 0.9615537984903284
key: test_jcc
value: [1. 0.90909091 1. 0.96875 0.93939394 0.88571429
0.96875 0.93548387 0.9375 0.90625 ]
mean value: 0.9450933005166876
key: train_jcc
value: [0.95138889 0.96153846 0.94809689 0.95155709 0.95818815 0.96153846
0.95486111 0.96167247 0.95486111 0.96515679]
mean value: 0.9568859435029576
MCC on Blind test: 0.14
Accuracy on Blind test: 0.33
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02589321 0.0243125 0.02480197 0.02414203 0.02598643 0.0267787
0.02300882 0.02308178 0.02694511 0.02658701]
mean value: 0.025153756141662598
key: score_time
value: [0.01105022 0.01108479 0.02707553 0.01083922 0.01093078 0.01091671
0.01093459 0.01086307 0.01094246 0.01091409]
mean value: 0.012555146217346191
key: test_mcc
value: [1. 0.7130241 0.77784447 0.83914639 0.87096774 0.87096774
0.74193548 0.84266484 0.67314268 0.8688172 ]
mean value: 0.8198510652102912
key: train_mcc
value: [0.87415162 0.85611511 0.87052613 0.84894283 0.84894283 0.84892086
0.85256763 0.84537297 0.86364692 0.85997009]
mean value: 0.8569156981998511
key: test_accuracy
value: [1. 0.85483871 0.88709677 0.91935484 0.93548387 0.93548387
0.87096774 0.91935484 0.83606557 0.93442623]
mean value: 0.9093072448439978
key: train_accuracy
value: [0.93705036 0.92805755 0.9352518 0.92446043 0.92446043 0.92446043
0.92625899 0.92266187 0.93177738 0.92998205]
mean value: 0.9284421295997314
key: test_fscore
value: [1. 0.86153846 0.89230769 0.92063492 0.93548387 0.93548387
0.87096774 0.91525424 0.84375 0.93333333]
mean value: 0.9108754128973511
key: train_fscore
value: [0.93670886 0.92805755 0.93548387 0.92473118 0.92473118 0.92446043
0.92665474 0.92307692 0.93214286 0.92998205]
mean value: 0.9286029650436789
key: test_precision
value: [1. 0.82352941 0.85294118 0.90625 0.93548387 0.93548387
0.87096774 0.96428571 0.81818182 0.93333333]
mean value: 0.9040456937907128
key: train_precision
value: [0.94181818 0.92805755 0.93214286 0.92142857 0.92142857 0.92446043
0.92170819 0.91814947 0.92553191 0.93165468]
mean value: 0.9266380409827853
key: test_recall
value: [1. 0.90322581 0.93548387 0.93548387 0.93548387 0.93548387
0.87096774 0.87096774 0.87096774 0.93333333]
mean value: 0.9191397849462365
key: train_recall
value: [0.93165468 0.92805755 0.93884892 0.92805755 0.92805755 0.92446043
0.93165468 0.92805755 0.93884892 0.92831541]
mean value: 0.9306013253912999
key: test_roc_auc
value: [1. 0.85483871 0.88709677 0.91935484 0.93548387 0.93548387
0.87096774 0.91935484 0.83548387 0.9344086 ]
mean value: 0.909247311827957
key: train_roc_auc
value: [0.93705036 0.92805755 0.9352518 0.92446043 0.92446043 0.92446043
0.92625899 0.92266187 0.93179005 0.92998504]
mean value: 0.9284436966555788
key: test_jcc
value: [1. 0.75675676 0.80555556 0.85294118 0.87878788 0.87878788
0.77142857 0.84375 0.72972973 0.875 ]
mean value: 0.839273754751696
key: train_jcc
value: [0.88095238 0.86577181 0.87878788 0.86 0.86 0.85953177
0.86333333 0.85714286 0.8729097 0.86912752]
mean value: 0.8667557250647416
MCC on Blind test: 0.21
Accuracy on Blind test: 0.5
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.84429264 0.72940993 0.72171068 0.85939646 0.69464445 0.72773337
0.77860117 0.70124364 0.78092885 0.7428112 ]
mean value: 0.7580772399902344
key: score_time
value: [0.01205468 0.01223755 0.01254439 0.01247644 0.02100563 0.01274776
0.01243854 0.01249003 0.01463079 0.01232004]
mean value: 0.013494586944580078
key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90369611 0.93548387
1. 0.87278605 0.90215054 0.8688172 ]
mean value: 0.9262394532240339
key: train_mcc
value: [0.94966486 0.96412858 0.94604929 0.96763216 0.96405373 0.96405373
0.94604929 0.97482645 0.96774069 0.96783888]
mean value: 0.9612037646576601
key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194
1. 0.93548387 0.95081967 0.93442623]
mean value: 0.9627181385510312
key: train_accuracy
value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439
0.97302158 0.98741007 0.98384201 0.98384201]
mean value: 0.9805813517946863
key: test_fscore
value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95081967 0.96774194
1. 0.93333333 0.95081967 0.93333333]
mean value: 0.9625369577246891
key: train_fscore
value: [0.97491039 0.98214286 0.97307002 0.98384201 0.98207885 0.98207885
0.97307002 0.98743268 0.98389982 0.98401421]
mean value: 0.9806539709925397
key: test_precision
value: [0.96875 0.96774194 1. 0.91176471 0.96666667 0.96774194
1. 0.96551724 0.96666667 0.93333333]
mean value: 0.9648182484896072
key: train_precision
value: [0.97142857 0.9751773 0.97132616 0.98207885 0.97857143 0.97857143
0.97132616 0.98566308 0.97864769 0.97535211]
mean value: 0.9768142798277739
key: test_recall
value: [1. 0.96774194 0.96774194 1. 0.93548387 0.96774194
1. 0.90322581 0.93548387 0.93333333]
mean value: 0.9610752688172043
key: train_recall
value: [0.97841727 0.98920863 0.97482014 0.98561151 0.98561151 0.98561151
0.97482014 0.98920863 0.98920863 0.99283154]
mean value: 0.9845349526830148
key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194
1. 0.93548387 0.95107527 0.9344086 ]
mean value: 0.962741935483871
key: train_roc_auc
value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439
0.97302158 0.98741007 0.98385163 0.98382584]
mean value: 0.9805806967329362
key: test_jcc
value: [0.96875 0.9375 0.96774194 0.91176471 0.90625 0.9375
1. 0.875 0.90625 0.875 ]
mean value: 0.9285756641366224
key: train_jcc
value: [0.95104895 0.96491228 0.94755245 0.96819788 0.96478873 0.96478873
0.94755245 0.9751773 0.96830986 0.96853147]
mean value: 0.9620860104153928
MCC on Blind test: 0.14
Accuracy on Blind test: 0.35
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01086211 0.01025057 0.00859928 0.00814486 0.00849771 0.00856209
0.0083437 0.00834632 0.00836563 0.0083468 ]
mean value: 0.00883190631866455
key: score_time
value: [0.01082826 0.00907207 0.00906467 0.00892687 0.00863767 0.00889754
0.00867772 0.00834537 0.00863099 0.00864482]
mean value: 0.008972597122192384
key: test_mcc
value: [0.83914639 0.64820372 0.71004695 0.81325006 0.80645161 0.74348441
0.61418277 0.87278605 0.60645161 0.70505961]
mean value: 0.7359063194782269
key: train_mcc
value: [0.75529076 0.7627676 0.76266888 0.74820144 0.73741484 0.74837576
0.74460913 0.73025835 0.76301539 0.75249226]
mean value: 0.7505094421634964
key: test_accuracy
value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774
0.80645161 0.93548387 0.80327869 0.85245902]
mean value: 0.8671866737176097
key: train_accuracy
value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072
0.87230216 0.86510791 0.88150808 0.87612208]
mean value: 0.8748637355824497
key: test_fscore
value: [0.92063492 0.83076923 0.85245902 0.90909091 0.90322581 0.875
0.8 0.93333333 0.80645161 0.84745763]
mean value: 0.8678422456695318
key: train_fscore
value: [0.88215488 0.88 0.88214286 0.87410072 0.86894075 0.87272727
0.87253142 0.86437613 0.88129496 0.87477314]
mean value: 0.8753042137774966
key: test_precision
value: [0.90625 0.79411765 0.86666667 0.85714286 0.90322581 0.84848485
0.82758621 0.96551724 0.80645161 0.86206897]
mean value: 0.8637511852501137
key: train_precision
value: [0.82911392 0.88970588 0.87588652 0.87410072 0.86738351 0.88235294
0.87096774 0.86909091 0.88129496 0.88602941]
mean value: 0.8725926531191879
key: test_recall
value: [0.93548387 0.87096774 0.83870968 0.96774194 0.90322581 0.90322581
0.77419355 0.90322581 0.80645161 0.83333333]
mean value: 0.8736559139784946
key: train_recall
value: [0.94244604 0.8705036 0.88848921 0.87410072 0.8705036 0.86330935
0.87410072 0.85971223 0.88129496 0.86379928]
mean value: 0.8788259714808798
key: test_roc_auc
value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774
0.80645161 0.93548387 0.80322581 0.85215054]
mean value: 0.8671505376344086
key: train_roc_auc
value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072
0.87230216 0.86510791 0.8815077 0.87614425]
mean value: 0.8748659137206364
key: test_jcc
value: [0.85294118 0.71052632 0.74285714 0.83333333 0.82352941 0.77777778
0.66666667 0.875 0.67567568 0.73529412]
mean value: 0.7693601617982423
key: train_jcc
value: [0.78915663 0.78571429 0.78913738 0.77635783 0.76825397 0.77419355
0.77388535 0.7611465 0.78778135 0.77741935]
mean value: 0.778304618898389
MCC on Blind test: 0.2
Accuracy on Blind test: 0.54
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00874949 0.00879979 0.0084486 0.00865364 0.00868559 0.00862646
0.00848246 0.00858378 0.00882339 0.00859857]
mean value: 0.008645176887512207
key: score_time
value: [0.00911641 0.00886369 0.00856209 0.00891066 0.00887418 0.00860906
0.00873876 0.00870085 0.00862622 0.008708 ]
mean value: 0.008770990371704101
key: test_mcc
value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441
0.64820372 0.74193548 0.63978495 0.67204301]
mean value: 0.6813286129520032
key: train_mcc
value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372
0.7019886 0.70220704 0.69929441 0.69881448]
mean value: 0.698421180066379
key: test_accuracy
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
0.82258065 0.87096774 0.81967213 0.83606557]
mean value: 0.8397673188789001
key: train_accuracy
value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799
0.85071942 0.85071942 0.8491921 0.8491921 ]
mean value: 0.8487952546400941
key: test_fscore
value: [0.81355932 0.84848485 0.75 0.875 0.9 0.875
0.83076923 0.87096774 0.81967213 0.83333333]
mean value: 0.8416786607704336
key: train_fscore
value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338
0.85361552 0.85413005 0.85263158 0.85211268]
mean value: 0.8520582734853629
key: test_precision
value: [0.85714286 0.8 0.72727273 0.84848485 0.93103448 0.84848485
0.79411765 0.87096774 0.83333333 0.83333333]
mean value: 0.8344171819804876
key: train_precision
value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103
0.83737024 0.83505155 0.83219178 0.83737024]
mean value: 0.8341657184309065
key: test_recall
value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581
0.87096774 0.87096774 0.80645161 0.83333333]
mean value: 0.8510752688172043
key: train_recall
value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072
0.8705036 0.87410072 0.87410072 0.86738351]
mean value: 0.8709110131249839
key: test_roc_auc
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
0.82258065 0.87096774 0.81989247 0.83602151]
mean value: 0.8397849462365592
key: train_roc_auc
value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799
0.85071942 0.85071942 0.84923674 0.84915938]
mean value: 0.8487964467135968
key: test_jcc
value: [0.68571429 0.73684211 0.6 0.77777778 0.81818182 0.77777778
0.71052632 0.77142857 0.69444444 0.71428571]
mean value: 0.7286978810663021
key: train_jcc
value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231
0.74461538 0.74539877 0.74311927 0.74233129]
mean value: 0.7422552816171282
MCC on Blind test: 0.19
Accuracy on Blind test: 0.49
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00824833 0.00824618 0.00818968 0.00823784 0.00803328 0.00808263
0.0080018 0.00826025 0.00804806 0.00802422]
mean value: 0.008137226104736328
key: score_time
value: [0.02001548 0.0169642 0.01295042 0.01175404 0.01527023 0.01145744
0.01146245 0.01176381 0.01168776 0.01165533]
mean value: 0.01349811553955078
key: test_mcc
value: [0.75623534 0.67741935 0.64820372 0.83914639 0.80813523 0.74193548
0.61418277 0.68313005 0.67204301 0.67721392]
mean value: 0.7117645281572317
key: train_mcc
value: [0.75664991 0.80977699 0.79501032 0.78789723 0.7814304 0.77770329
0.79138739 0.77342633 0.78180276 0.78587941]
mean value: 0.7840964017204444
key: test_accuracy
value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774
0.80645161 0.83870968 0.83606557 0.83606557]
mean value: 0.8543098889476468
key: train_accuracy
value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921
0.89568345 0.88669065 0.89048474 0.89228007]
mean value: 0.8917656897821061
key: test_fscore
value: [0.85714286 0.83870968 0.81355932 0.92063492 0.90625 0.87096774
0.8125 0.82758621 0.83870968 0.82142857]
mean value: 0.8507488974910993
key: train_fscore
value: [0.87407407 0.90310786 0.89692586 0.89292196 0.88766114 0.88602941
0.89605735 0.88607595 0.88766114 0.88929889]
mean value: 0.8899813639558726
key: test_precision
value: [0.96 0.83870968 0.85714286 0.90625 0.87878788 0.87096774
0.78787879 0.88888889 0.83870968 0.88461538]
mean value: 0.8711950894087991
key: train_precision
value: [0.90076336 0.91821561 0.90181818 0.9010989 0.90943396 0.90601504
0.89285714 0.89090909 0.90943396 0.91634981]
mean value: 0.9046895060853061
key: test_recall
value: [0.77419355 0.83870968 0.77419355 0.93548387 0.93548387 0.87096774
0.83870968 0.77419355 0.83870968 0.76666667]
mean value: 0.8347311827956989
key: train_recall
value: [0.84892086 0.88848921 0.89208633 0.88489209 0.86690647 0.86690647
0.89928058 0.88129496 0.86690647 0.86379928]
mean value: 0.8759482736391532
key: test_roc_auc
value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774
0.80645161 0.83870968 0.83602151 0.83494624]
mean value: 0.8541935483870968
key: train_roc_auc
value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921
0.89568345 0.88669065 0.89044248 0.8923313 ]
mean value: 0.8917665867306155
key: test_jcc
value: [0.75 0.72222222 0.68571429 0.85294118 0.82857143 0.77142857
0.68421053 0.70588235 0.72222222 0.6969697 ]
mean value: 0.7420162482855981
key: train_jcc
value: [0.77631579 0.82333333 0.81311475 0.80655738 0.79801325 0.79537954
0.81168831 0.79545455 0.79801325 0.80066445]
mean value: 0.8018534590944678
MCC on Blind test: 0.17
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01743269 0.01715159 0.01687074 0.01602173 0.01673126 0.01642895
0.01583314 0.01826096 0.01568484 0.0176158 ]
mean value: 0.016803169250488283
key: score_time
value: [0.01035261 0.00926685 0.01012945 0.00933409 0.00936246 0.01025271
0.00945616 0.01029134 0.00932956 0.0092721 ]
mean value: 0.00970473289489746
key: test_mcc
value: [0.93548387 0.69047575 0.62471615 0.77784447 0.77784447 0.75623534
0.58338335 0.74348441 0.61090565 0.81062315]
mean value: 0.7310996615906107
key: train_mcc
value: [0.82186847 0.79485081 0.75204143 0.78877892 0.78485761 0.7611094
0.79209132 0.77560672 0.78260516 0.81085297]
mean value: 0.7864662785132636
key: test_accuracy
value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774
0.79032258 0.87096774 0.80327869 0.90163934]
mean value: 0.8624272871496562
key: train_accuracy
value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784
0.89388489 0.88489209 0.88868941 0.9048474 ]
mean value: 0.8910443279128941
key: test_fscore
value: [0.96774194 0.85294118 0.82352941 0.89230769 0.89230769 0.88235294
0.8 0.875 0.81818182 0.90625 ]
mean value: 0.8710612667692839
key: train_fscore
value: [0.91289199 0.90034364 0.88067227 0.89761092 0.8957265 0.88474576
0.8991453 0.89152542 0.89455782 0.90750436]
mean value: 0.8964723986527141
key: test_precision
value: [0.96774194 0.78378378 0.75675676 0.85294118 0.85294118 0.81081081
0.76470588 0.84848485 0.77142857 0.85294118]
mean value: 0.8262536118513348
key: train_precision
value: [0.88513514 0.86184211 0.82649842 0.8538961 0.8534202 0.83653846
0.85667752 0.84294872 0.8483871 0.88435374]
mean value: 0.8549697504635009
key: test_recall
value: [0.96774194 0.93548387 0.90322581 0.93548387 0.93548387 0.96774194
0.83870968 0.90322581 0.87096774 0.96666667]
mean value: 0.9224731182795699
key: train_recall
value: [0.94244604 0.94244604 0.94244604 0.94604317 0.94244604 0.93884892
0.94604317 0.94604317 0.94604317 0.93189964]
mean value: 0.9424705396972745
key: test_roc_auc
value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774
0.79032258 0.87096774 0.80215054 0.90268817]
mean value: 0.8624193548387098
key: train_roc_auc
value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784
0.89388489 0.88489209 0.88879219 0.90479874]
mean value: 0.8910497408524792
key: test_jcc
value: [0.9375 0.74358974 0.7 0.80555556 0.80555556 0.78947368
0.66666667 0.77777778 0.69230769 0.82857143]
mean value: 0.7746998104234947
key: train_jcc
value: [0.83974359 0.81875 0.78678679 0.81424149 0.81114551 0.79331307
0.81677019 0.80428135 0.80923077 0.83067093]
mean value: 0.812493367099271
MCC on Blind test: 0.25
Accuracy on Blind test: 0.48
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.58698964 1.47472477 1.58772182 1.53978014 1.45300221 1.59102941
1.59130311 1.50179529 1.59061027 1.5674026 ]
mean value: 1.548435926437378
key: score_time
value: [0.01429367 0.01347637 0.01343799 0.0135057 0.01355076 0.01363134
0.01368833 0.01345825 0.01342821 0.01383781]
mean value: 0.013630843162536621
key: test_mcc
value: [0.96824584 0.84266484 0.87278605 0.93743687 0.93743687 0.90369611
0.90369611 0.90748521 0.83638369 0.8688172 ]
mean value: 0.8978648796280239
key: train_mcc
value: [0.99283145 0.98921503 0.98921503 0.98561151 0.98921503 0.98921503
0.98561151 0.99640932 0.99284416 0.99641577]
mean value: 0.9906583855147647
key: test_accuracy
value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129
0.9516129 0.9516129 0.91803279 0.93442623]
mean value: 0.9481491274457959
key: train_accuracy
value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432
0.99280576 0.99820144 0.99640934 0.99820467]
mean value: 0.9953247097115845
key: test_fscore
value: [0.98412698 0.92307692 0.9375 0.96875 0.96875 0.95081967
0.95238095 0.94915254 0.92063492 0.93333333]
mean value: 0.9488525328057142
key: train_fscore
value: [0.99638989 0.99459459 0.99459459 0.99280576 0.99459459 0.99459459
0.99280576 0.9981982 0.99638989 0.99820467]
mean value: 0.9953172538625
key: test_precision
value: [0.96875 0.88235294 0.90909091 0.93939394 0.93939394 0.96666667
0.9375 1. 0.90625 0.93333333]
mean value: 0.9382731729055258
key: train_precision
value: [1. 0.99638989 0.99638989 0.99280576 0.99638989 0.99638989
0.99280576 1. 1. 1. ]
mean value: 0.997117107757837
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.93548387
0.96774194 0.90322581 0.93548387 0.93333333]
mean value: 0.9610752688172043
key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
0.99280576 0.99640288 0.99280576 0.99641577]
mean value: 0.9935264691472628
key: test_roc_auc
value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129
0.9516129 0.9516129 0.91774194 0.9344086 ]
mean value: 0.9481182795698926
key: train_roc_auc
value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432
0.99280576 0.99820144 0.99640288 0.99820789]
mean value: 0.9953243856527682
key: test_jcc
value: [0.96875 0.85714286 0.88235294 0.93939394 0.93939394 0.90625
0.90909091 0.90322581 0.85294118 0.875 ]
mean value: 0.9033541569120317
key: train_jcc
value: [0.99280576 0.98924731 0.98924731 0.98571429 0.98924731 0.98924731
0.98571429 0.99640288 0.99280576 0.99641577]
mean value: 0.9906847977838927
MCC on Blind test: 0.18
Accuracy on Blind test: 0.36
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01465797 0.01315546 0.01135731 0.01076937 0.01108599 0.01082325
0.01092386 0.01045871 0.01094151 0.01048708]
mean value: 0.011466050148010254
key: score_time
value: [0.01047611 0.00827646 0.00819159 0.00810289 0.00804043 0.00810003
0.00786495 0.00790858 0.00797296 0.00794983]
mean value: 0.008288383483886719
key: test_mcc
value: [0.96824584 0.90369611 1. 0.90748521 0.90369611 0.87831007
0.84266484 0.96824584 0.8688172 0.87055472]
mean value: 0.9111715945488771
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.9516129 1. 0.9516129 0.9516129 0.93548387
0.91935484 0.98387097 0.93442623 0.93442623]
mean value: 0.9546271813855103
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98412698 0.95081967 1. 0.95384615 0.95081967 0.93103448
0.91525424 0.98360656 0.93548387 0.93103448]
mean value: 0.9536026113385601
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96875 0.96666667 1. 0.91176471 0.96666667 1.
0.96428571 1. 0.93548387 0.96428571]
mean value: 0.9677903338754856
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.93548387 1. 1. 0.93548387 0.87096774
0.87096774 0.96774194 0.93548387 0.9 ]
mean value: 0.9416129032258065
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.9516129 1. 0.9516129 0.9516129 0.93548387
0.91935484 0.98387097 0.9344086 0.93387097]
mean value: 0.9545698924731183
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96875 0.90625 1. 0.91176471 0.90625 0.87096774
0.84375 0.96774194 0.87878788 0.87096774]
mean value: 0.912523000402507
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.58
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10264206 0.10256195 0.10940504 0.10670924 0.10698891 0.10749125
0.10634494 0.1044426 0.1049583 0.10701418]
mean value: 0.10585584640502929
key: score_time
value: [0.01734233 0.01776242 0.01870441 0.01858568 0.01841116 0.01849437
0.01723242 0.01806641 0.01844049 0.01706672]
mean value: 0.018010640144348146
key: test_mcc
value: [0.93743687 0.81325006 0.87096774 0.87278605 0.93743687 0.90369611
0.80645161 0.93743687 0.8688172 0.90215054]
mean value: 0.8850429919540547
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129
0.90322581 0.96774194 0.93442623 0.95081967]
mean value: 0.9417503966155474
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96875 0.90909091 0.93548387 0.9375 0.96875 0.95238095
0.90322581 0.96666667 0.93548387 0.95081967]
mean value: 0.9428151748656772
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93939394 0.85714286 0.93548387 0.90909091 0.93939394 0.9375
0.90322581 1. 0.93548387 0.93548387]
mean value: 0.9292199064376484
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 0.93548387 0.96774194 1. 0.96774194
0.90322581 0.93548387 0.93548387 0.96666667]
mean value: 0.9579569892473119
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129
0.90322581 0.96774194 0.9344086 0.95107527]
mean value: 0.9417741935483872
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93939394 0.83333333 0.87878788 0.88235294 0.93939394 0.90909091
0.82352941 0.93548387 0.87878788 0.90625 ]
mean value: 0.8926404102696797
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.22
Accuracy on Blind test: 0.4
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00817394 0.0080502 0.0084374 0.00861168 0.00831032 0.00796342
0.00770545 0.00855494 0.00841975 0.00792027]
mean value: 0.008214735984802246
key: score_time
value: [0.00823379 0.00851464 0.00845742 0.00857925 0.00831676 0.00783062
0.00798845 0.00863981 0.00799108 0.00859761]
mean value: 0.008314943313598633
key: test_mcc
value: [0.71004695 0.5809475 0.67883359 0.59603956 0.64549722 0.77784447
0.65372045 0.67883359 0.77072165 0.77096774]
mean value: 0.68634527326083
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677
0.82258065 0.83870968 0.8852459 0.8852459 ]
mean value: 0.841565309360127
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85245902 0.79365079 0.83333333 0.76363636 0.81967213 0.88135593
0.80701754 0.83333333 0.88888889 0.8852459 ]
mean value: 0.835859323808608
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86666667 0.78125 0.86206897 0.875 0.83333333 0.92857143
0.88461538 0.86206897 0.875 0.87096774]
mean value: 0.8639542486156779
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83870968 0.80645161 0.80645161 0.67741935 0.80645161 0.83870968
0.74193548 0.80645161 0.90322581 0.9 ]
mean value: 0.8125806451612902
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677
0.82258065 0.83870968 0.88494624 0.88548387]
mean value: 0.8415591397849462
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.74285714 0.65789474 0.71428571 0.61764706 0.69444444 0.78787879
0.67647059 0.71428571 0.8 0.79411765]
mean value: 0.7199881834711557
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.43
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.35455632 1.36369705 1.44976234 1.43192887 1.36655641 1.41406369
1.44773722 1.37284899 1.38293886 1.38959265]
mean value: 1.3973682403564454
key: score_time
value: [0.09139943 0.09957314 0.09985614 0.09845757 0.09911156 0.0994699
0.09767675 0.09422445 0.09873199 0.09957123]
mean value: 0.09780721664428711
key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.93743687 0.96824584
1. 0.96824584 0.96770777 0.8688172 ]
mean value: 0.9489914273704848
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.96774194 0.98387097
1. 0.98387097 0.98360656 0.93442623]
mean value: 0.9740613432046537
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98412698 0.96774194 0.98412698 0.95384615 0.96875 0.98360656
1. 0.98360656 0.98412698 0.93333333]
mean value: 0.9743265489798408
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96875 0.96774194 0.96875 0.91176471 0.93939394 1.
1. 1. 0.96875 0.93333333]
mean value: 0.9658483914093496
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
1. 0.96774194 1. 0.93333333]
mean value: 0.9836559139784946
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.96774194 0.98387097
1. 0.98387097 0.98333333 0.9344086 ]
mean value: 0.9740322580645162
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96875 0.9375 0.96875 0.91176471 0.93939394 0.96774194
1. 0.96774194 0.96875 0.875 ]
mean value: 0.9505392516244034
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.35
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87023854 0.93618393 0.92621827 0.95837045 1.00464082 0.93836141
0.98241544 0.91587925 0.8986578 0.98821497]
mean value: 0.9419180870056152
key: score_time
value: [0.23300123 0.2598815 0.26490426 0.22142696 0.22671819 0.23441744
0.25722957 0.27357078 0.23566699 0.21245193]
mean value: 0.2419268846511841
key: test_mcc
value: [0.96824584 0.87278605 0.93743687 0.90748521 0.93743687 0.96824584
0.96824584 0.96824584 0.93635873 0.83655914]
mean value: 0.9301046213212982
key: train_mcc
value: [0.96778244 0.97132357 0.96778244 0.97487691 0.96768225 0.96768225
0.96778244 0.96778244 0.97137553 0.9784809 ]
mean value: 0.9702551166949516
key: test_accuracy
value: [0.98387097 0.93548387 0.96774194 0.9516129 0.96774194 0.98387097
0.98387097 0.98387097 0.96721311 0.91803279]
mean value: 0.9643310417768377
key: train_accuracy
value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295
0.98381295 0.98381295 0.98563734 0.98922801]
mean value: 0.9850764630665306
key: test_fscore
value: [0.98412698 0.9375 0.96875 0.95384615 0.96875 0.98360656
0.98412698 0.98360656 0.96875 0.91803279]
mean value: 0.9651096023739466
key: train_fscore
value: [0.98395722 0.98571429 0.98395722 0.98747764 0.98389982 0.98389982
0.98395722 0.98395722 0.98571429 0.98928571]
mean value: 0.985182044357831
key: test_precision
value: [0.96875 0.90909091 0.93939394 0.91176471 0.93939394 1.
0.96875 1. 0.93939394 0.90322581]
mean value: 0.9479763239606693
key: train_precision
value: [0.97526502 0.9787234 0.97526502 0.98220641 0.97864769 0.97864769
0.97526502 0.97526502 0.9787234 0.98576512]
mean value: 0.9783773783096608
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
1. 0.96774194 1. 0.93333333]
mean value: 0.9836559139784946
key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.98920863 0.98920863
0.99280576 0.99280576 0.99280576 0.99283154]
mean value: 0.9920889095175472
key: test_roc_auc
value: [0.98387097 0.93548387 0.96774194 0.9516129 0.96774194 0.98387097
0.98387097 0.98387097 0.96666667 0.91827957]
mean value: 0.9643010752688173
key: train_roc_auc
value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295
0.98381295 0.98381295 0.98565019 0.98922153]
mean value: 0.9850770996106342
key: test_jcc
value: [0.96875 0.88235294 0.93939394 0.91176471 0.93939394 0.96774194
0.96875 0.96774194 0.93939394 0.84848485]
mean value: 0.9333768184693232
key: train_jcc
value: [0.96842105 0.97183099 0.96842105 0.97526502 0.96830986 0.96830986
0.96842105 0.96842105 0.97183099 0.97879859]
mean value: 0.9708029504907444
MCC on Blind test: 0.15
Accuracy on Blind test: 0.4
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01861191 0.00832939 0.00832176 0.00824809 0.00866604 0.00814605
0.00879598 0.00842834 0.00795603 0.00822353]
mean value: 0.009372711181640625
key: score_time
value: [0.00951219 0.00824142 0.00890613 0.00865459 0.00877047 0.00828552
0.00873399 0.00808811 0.00830865 0.00846767]
mean value: 0.00859687328338623
key: test_mcc
value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441
0.64820372 0.74193548 0.63978495 0.67204301]
mean value: 0.6813286129520032
key: train_mcc
value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372
0.7019886 0.70220704 0.69929441 0.69881448]
mean value: 0.698421180066379
key: test_accuracy
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
0.82258065 0.87096774 0.81967213 0.83606557]
mean value: 0.8397673188789001
key: train_accuracy
value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799
0.85071942 0.85071942 0.8491921 0.8491921 ]
mean value: 0.8487952546400941
key: test_fscore
value: [0.81355932 0.84848485 0.75 0.875 0.9 0.875
0.83076923 0.87096774 0.81967213 0.83333333]
mean value: 0.8416786607704336
key: train_fscore
value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338
0.85361552 0.85413005 0.85263158 0.85211268]
mean value: 0.8520582734853629
key: test_precision
value: [0.85714286 0.8 0.72727273 0.84848485 0.93103448 0.84848485
0.79411765 0.87096774 0.83333333 0.83333333]
mean value: 0.8344171819804876
key: train_precision
value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103
0.83737024 0.83505155 0.83219178 0.83737024]
mean value: 0.8341657184309065
key: test_recall
value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581
0.87096774 0.87096774 0.80645161 0.83333333]
mean value: 0.8510752688172043
key: train_recall
value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072
0.8705036 0.87410072 0.87410072 0.86738351]
mean value: 0.8709110131249839
key: test_roc_auc
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
0.82258065 0.87096774 0.81989247 0.83602151]
mean value: 0.8397849462365592
key: train_roc_auc
value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799
0.85071942 0.85071942 0.84923674 0.84915938]
mean value: 0.8487964467135968
key: test_jcc
value: [0.68571429 0.73684211 0.6 0.77777778 0.81818182 0.77777778
0.71052632 0.77142857 0.69444444 0.71428571]
mean value: 0.7286978810663021
key: train_jcc
value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231
0.74461538 0.74539877 0.74311927 0.74233129]
mean value: 0.7422552816171282
MCC on Blind test: 0.19
Accuracy on Blind test: 0.49
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08428884 0.04923701 0.12757492 0.1029861 0.05474067 0.05481815
0.06141877 0.06270385 0.06345892 0.05934381]
mean value: 0.07205710411071778
key: score_time
value: [0.01002645 0.00963044 0.01171899 0.01000237 0.00956392 0.00953889
0.00953102 0.00952125 0.00951862 0.00952578]
mean value: 0.009857773780822754
key: test_mcc
value: [0.96824584 0.90369611 0.93743687 0.90748521 0.90369611 0.93743687
1. 0.96824584 0.96770777 0.8688172 ]
mean value: 0.9362767824424617
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.9516129 0.96774194 0.9516129 0.9516129 0.96774194
1. 0.98387097 0.98360656 0.93442623]
mean value: 0.9676097303014278
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98412698 0.95238095 0.96875 0.95384615 0.95238095 0.96666667
1. 0.98360656 0.98412698 0.93333333]
mean value: 0.9679218584239075
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96875 0.9375 0.93939394 0.91176471 0.9375 1.
1. 1. 0.96875 0.93333333]
mean value: 0.9596991978609626
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 0.96774194 0.93548387
1. 0.96774194 1. 0.93333333]
mean value: 0.9772043010752688
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.9516129 0.96774194 0.9516129 0.9516129 0.96774194
1. 0.98387097 0.98333333 0.9344086 ]
mean value: 0.9675806451612904
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96875 0.90909091 0.93939394 0.91176471 0.90909091 0.93548387
1. 0.96774194 0.96875 0.875 ]
mean value: 0.9385066269909723
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.61
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01458907 0.04201937 0.02599144 0.01775765 0.04186487 0.0279336
0.01842332 0.04155302 0.04179025 0.01767874]
mean value: 0.028960132598876955
key: score_time
value: [0.01030087 0.02038527 0.01068902 0.01067185 0.01916838 0.01074195
0.01076746 0.01074457 0.02005053 0.010741 ]
mean value: 0.013426089286804199
key: test_mcc
value: [0.96824584 0.87278605 1. 0.90748521 0.96824584 0.96824584
1. 0.93743687 0.87082935 0.83655914]
mean value: 0.9329834129888399
key: train_mcc
value: [0.95329292 0.9497386 0.95329292 0.96048758 0.94966486 0.94966486
0.93900081 0.95339163 0.95693712 0.96065614]
mean value: 0.9526127442796535
key: test_accuracy
value: [0.98387097 0.93548387 1. 0.9516129 0.98387097 0.98387097
1. 0.96774194 0.93442623 0.91803279]
mean value: 0.9658910629296669
key: train_accuracy
value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014
0.96942446 0.97661871 0.97845601 0.98025135]
mean value: 0.9762664195394134
key: test_fscore
value: [0.98360656 0.9375 1. 0.95384615 0.98412698 0.98360656
1. 0.96666667 0.93333333 0.91803279]
mean value: 0.9660719039612482
key: train_fscore
value: [0.97674419 0.975 0.97674419 0.980322 0.97491039 0.97491039
0.96969697 0.97682709 0.97849462 0.98046181]
mean value: 0.9764111663751257
key: test_precision
value: [1. 0.90909091 1. 0.91176471 0.96875 1.
1. 1. 0.96551724 0.90322581]
mean value: 0.9658348662804185
key: train_precision
value: [0.97153025 0.96808511 0.97153025 0.97508897 0.97142857 0.97142857
0.96113074 0.96819788 0.975 0.97183099]
mean value: 0.9705251323255912
key: test_recall
value: [0.96774194 0.96774194 1. 1. 1. 0.96774194
1. 0.93548387 0.90322581 0.93333333]
mean value: 0.9675268817204301
key: train_recall
value: [0.98201439 0.98201439 0.98201439 0.98561151 0.97841727 0.97841727
0.97841727 0.98561151 0.98201439 0.98924731]
mean value: 0.9823779685928676
key: test_roc_auc
value: [0.98387097 0.93548387 1. 0.9516129 0.98387097 0.98387097
1. 0.96774194 0.93494624 0.91827957]
mean value: 0.9659677419354838
key: train_roc_auc
value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014
0.96942446 0.97661871 0.97846239 0.98023517]
mean value: 0.976265439261494
key: test_jcc
value: [0.96774194 0.88235294 1. 0.91176471 0.96875 0.96774194
1. 0.93548387 0.875 0.84848485]
mean value: 0.9357320237479156
key: train_jcc
value: [0.95454545 0.95121951 0.95454545 0.96140351 0.95104895 0.95104895
0.94117647 0.95470383 0.95789474 0.96167247]
mean value: 0.9539259346206412
MCC on Blind test: 0.17
Accuracy on Blind test: 0.38
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02236176 0.00778937 0.00771594 0.00752807 0.0074892 0.00744605
0.00749993 0.007586 0.00749612 0.00748873]
mean value: 0.009040117263793945
key: score_time
value: [0.01843238 0.00818586 0.00802255 0.00780058 0.00774455 0.00785375
0.00774026 0.00784397 0.00779438 0.00780678]
mean value: 0.008922505378723144
key: test_mcc
value: [0.77459667 0.65372045 0.55301004 0.74819006 0.74819006 0.7190925
0.58338335 0.77459667 0.57576971 0.81062315]
mean value: 0.6941172654572817
key: train_mcc
value: [0.70194087 0.71536572 0.73033254 0.70140848 0.70140848 0.70194087
0.72031981 0.70528679 0.72419371 0.70094494]
mean value: 0.7103142230314713
key: test_accuracy
value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871
0.79032258 0.88709677 0.78688525 0.90163934]
mean value: 0.8446589106292967
key: train_accuracy
value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086
0.85791367 0.85071942 0.85996409 0.8491921 ]
mean value: 0.8532897201090115
key: test_fscore
value: [0.8852459 0.8358209 0.78787879 0.87878788 0.87878788 0.86567164
0.8 0.88888889 0.8 0.90625 ]
mean value: 0.8527331873296211
key: train_fscore
value: [0.85665529 0.86254296 0.86986301 0.85616438 0.85616438 0.85665529
0.86541738 0.85811966 0.8668942 0.8556701 ]
mean value: 0.8604146652008448
key: test_precision
value: [0.9 0.77777778 0.74285714 0.82857143 0.82857143 0.80555556
0.76470588 0.875 0.76470588 0.85294118]
mean value: 0.8140686274509804
key: train_precision
value: [0.81493506 0.82565789 0.83006536 0.81699346 0.81699346 0.81493506
0.82200647 0.81758958 0.82467532 0.82178218]
mean value: 0.8205633864120958
key: test_recall
value: [0.87096774 0.90322581 0.83870968 0.93548387 0.93548387 0.93548387
0.83870968 0.90322581 0.83870968 0.96666667]
mean value: 0.8966666666666666
key: train_recall
value: [0.9028777 0.9028777 0.91366906 0.89928058 0.89928058 0.9028777
0.91366906 0.9028777 0.91366906 0.89247312]
mean value: 0.9043552254970217
key: test_roc_auc
value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871
0.79032258 0.88709677 0.78602151 0.90268817]
mean value: 0.8446774193548388
key: train_roc_auc
value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086
0.85791367 0.85071942 0.86006034 0.84911426]
mean value: 0.853291560300147
key: test_jcc
value: [0.79411765 0.71794872 0.65 0.78378378 0.78378378 0.76315789
0.66666667 0.8 0.66666667 0.82857143]
mean value: 0.7454696589216713
key: train_jcc
value: [0.74925373 0.75830816 0.76969697 0.74850299 0.74850299 0.74925373
0.76276276 0.75149701 0.76506024 0.74774775]
mean value: 0.7550586334969577
MCC on Blind test: 0.2
Accuracy on Blind test: 0.48
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01076055 0.01259518 0.01484299 0.0152657 0.01288152 0.01513839
0.01497507 0.01241827 0.01552725 0.01354051]
mean value: 0.013794541358947754
key: score_time
value: [0.00853276 0.01013088 0.01017213 0.01044273 0.01037955 0.01046228
0.01040554 0.01038742 0.01037264 0.01043701]
mean value: 0.010172295570373534
key: test_mcc
value: [0.93743687 0.81325006 0.84983659 0.87831007 0.93548387 0.96824584
0.93743687 0.90748521 0.87082935 0.70997538]
mean value: 0.8808290098706804
key: train_mcc
value: [0.89396219 0.81804143 0.8410572 0.96058703 0.93914669 0.95329292
0.9354697 0.94266562 0.95337563 0.78144333]
mean value: 0.9019041746413544
key: test_accuracy
value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097
0.96774194 0.9516129 0.93442623 0.83606557]
mean value: 0.9367265996827076
key: train_accuracy
value: [0.94604317 0.9028777 0.91546763 0.98021583 0.96942446 0.97661871
0.9676259 0.97122302 0.97666068 0.88150808]
mean value: 0.9487665164098523
key: test_fscore
value: [0.96666667 0.89655172 0.92537313 0.93939394 0.96774194 0.98360656
0.96666667 0.94915254 0.93333333 0.8 ]
mean value: 0.9328486499760696
key: train_fscore
value: [0.94423792 0.89370079 0.92153589 0.98039216 0.96903461 0.97674419
0.96727273 0.97153025 0.97649186 0.86746988]
mean value: 0.9468410268529506
key: test_precision
value: [1. 0.96296296 0.86111111 0.88571429 0.96774194 1.
1. 1. 0.96551724 1. ]
mean value: 0.9643047536651541
key: train_precision
value: [0.97692308 0.98695652 0.85981308 0.97173145 0.98154982 0.97153025
0.97794118 0.96126761 0.98181818 0.98630137]
mean value: 0.9655832529931669
key: test_recall
value: [0.93548387 0.83870968 1. 1. 0.96774194 0.96774194
0.93548387 0.90322581 0.90322581 0.66666667]
mean value: 0.9118279569892473
key: train_recall
value: [0.91366906 0.81654676 0.99280576 0.98920863 0.95683453 0.98201439
0.95683453 0.98201439 0.97122302 0.77419355]
mean value: 0.9335344627523787
key: test_roc_auc
value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097
0.96774194 0.9516129 0.93494624 0.83333333]
mean value: 0.936505376344086
key: train_roc_auc
value: [0.94604317 0.9028777 0.91546763 0.98021583 0.96942446 0.97661871
0.9676259 0.97122302 0.97665094 0.88170109]
mean value: 0.9487848430932674
key: test_jcc
value: [0.93548387 0.8125 0.86111111 0.88571429 0.9375 0.96774194
0.93548387 0.90322581 0.875 0.66666667]
mean value: 0.8780427547363031
key: train_jcc
value: [0.8943662 0.80782918 0.85448916 0.96153846 0.93992933 0.95454545
0.93661972 0.94463668 0.9540636 0.76595745]
mean value: 0.9013975235029617
MCC on Blind test: 0.13
Accuracy on Blind test: 0.31
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0138278 0.01442528 0.01401639 0.01388168 0.01454473 0.0136025
0.0134716 0.01331687 0.01288104 0.01246238]
mean value: 0.013643026351928711
key: score_time
value: [0.01061082 0.01157665 0.01069307 0.01073432 0.01059294 0.01056838
0.01065302 0.01038933 0.01044464 0.01039839]
mean value: 0.010666155815124511
key: test_mcc
value: [0.93743687 0.87278605 0.93743687 0.90748521 0.90369611 0.87831007
1. 0.78446454 0.72318666 0.50305191]
mean value: 0.8447854282682599
key: train_mcc
value: [0.90882979 0.95705746 0.95025527 0.94305636 0.92239227 0.89154571
0.94604929 0.77463214 0.83507476 0.45405525]
mean value: 0.858294830889454
key: test_accuracy
value: [0.96774194 0.93548387 0.96774194 0.9516129 0.9516129 0.93548387
1. 0.88709677 0.85245902 0.70491803]
mean value: 0.9154151242728715
key: train_accuracy
value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446
0.97302158 0.87589928 0.91202873 0.67504488]
mean value: 0.9218368572646372
key: test_fscore
value: [0.96666667 0.9375 0.96875 0.95384615 0.95081967 0.93103448
1. 0.89552239 0.86956522 0.57142857]
mean value: 0.9045133152282165
key: train_fscore
value: [0.95149254 0.97864769 0.97526502 0.97069597 0.95925926 0.94183865
0.97307002 0.88924559 0.91846922 0.52493438]
mean value: 0.908291832592524
key: test_precision
value: [1. 0.90909091 0.93939394 0.91176471 0.96666667 1.
1. 0.83333333 0.78947368 1. ]
mean value: 0.9349723238577727
key: train_precision
value: [0.98837209 0.96830986 0.95833333 0.98880597 0.98854962 0.98431373
0.97132616 0.80289855 0.85448916 0.98039216]
mean value: 0.9485790636020202
key: test_recall
value: [0.93548387 0.96774194 1. 1. 0.93548387 0.87096774
1. 0.96774194 0.96774194 0.4 ]
mean value: 0.9045161290322581
key: train_recall
value: [0.91726619 0.98920863 0.99280576 0.95323741 0.93165468 0.9028777
0.97482014 0.99640288 0.99280576 0.35842294]
mean value: 0.9009502075758747
key: test_roc_auc
value: [0.96774194 0.93548387 0.96774194 0.9516129 0.9516129 0.93548387
1. 0.88709677 0.85053763 0.7 ]
mean value: 0.914731182795699
key: train_roc_auc
value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446
0.97302158 0.87589928 0.91217349 0.67561435]
mean value: 0.9219082798277507
key: test_jcc
value: [0.93548387 0.88235294 0.93939394 0.91176471 0.90625 0.87096774
1. 0.81081081 0.76923077 0.4 ]
mean value: 0.8426254779397568
key: train_jcc
value: [0.90747331 0.95818815 0.95172414 0.9430605 0.92170819 0.89007092
0.94755245 0.80057803 0.84923077 0.35587189]
mean value: 0.8525458343695811
MCC on Blind test: 0.14
Accuracy on Blind test: 0.32
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11406326 0.10405397 0.10169864 0.10252857 0.09923482 0.10144997
0.09957933 0.10238481 0.10498977 0.10294104]
mean value: 0.10329241752624511
key: score_time
value: [0.01416016 0.01535344 0.01559019 0.01440263 0.01463914 0.01422262
0.01545978 0.01572537 0.01503325 0.0141983 ]
mean value: 0.014878487586975098
key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90748521 0.93743687
1. 0.90369611 1. 0.8688172 ]
mean value: 0.9396896154994742
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194
1. 0.9516129 1. 0.93442623]
mean value: 0.9692490745637229
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95384615 0.96666667
1. 0.95081967 1. 0.93333333]
mean value: 0.9693987456811359
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96875 0.96774194 1. 0.91176471 0.91176471 1.
1. 0.96666667 1. 0.93333333]
mean value: 0.9660021347248577
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.93548387
1. 0.93548387 1. 0.93333333]
mean value: 0.9739784946236559
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194
1. 0.9516129 1. 0.9344086 ]
mean value: 0.969247311827957
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96875 0.9375 0.96774194 0.91176471 0.91176471 0.93548387
1. 0.90625 1. 0.875 ]
mean value: 0.9414255218216319
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.31
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03851056 0.03913617 0.03798318 0.04739237 0.0397296 0.04984927
0.04227161 0.05067539 0.05400753 0.04927731]
mean value: 0.04488329887390137
key: score_time
value: [0.02179551 0.02289391 0.02226377 0.01712132 0.03155065 0.0246129
0.03463507 0.02148271 0.02362227 0.01659489]
mean value: 0.02365729808807373
key: test_mcc
value: [1. 0.90369611 1. 0.93743687 0.87096774 0.90748521
0.83914639 0.96824584 0.93635873 0.90204573]
mean value: 0.9265382629263172
key: train_mcc
value: [0.99640932 0.99640932 0.99280576 0.99640932 0.98563702 0.99280576
0.99640932 0.99640932 0.98923442 0.99284434]
mean value: 0.9935373910332435
key: test_accuracy
value: [1. 0.9516129 1. 0.96774194 0.93548387 0.9516129
0.91935484 0.98387097 0.96721311 0.95081967]
mean value: 0.9627710206240084
key: train_accuracy
value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288
0.99820144 0.99820144 0.994614 0.99640934]
mean value: 0.9967642044353745
key: test_fscore
value: [1. 0.95081967 1. 0.96875 0.93548387 0.94915254
0.91803279 0.98360656 0.96875 0.94915254]
mean value: 0.9623747972106947
key: train_fscore
value: [0.9981982 0.9981982 0.99640288 0.9981982 0.99277978 0.99640288
0.99820467 0.9981982 0.994614 0.99640288]
mean value: 0.9967599880734039
key: test_precision
value: [1. 0.96666667 1. 0.93939394 0.93548387 1.
0.93333333 1. 0.93939394 0.96551724]
mean value: 0.9679788991134931
key: train_precision
value: [1. 1. 0.99640288 1. 0.99637681 0.99640288
0.99641577 1. 0.99283154 1. ]
mean value: 0.9978429878817844
key: test_recall
value: [1. 0.93548387 1. 1. 0.93548387 0.90322581
0.90322581 0.96774194 1. 0.93333333]
mean value: 0.9578494623655914
key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99640288 0.98920863 0.99640288
1. 0.99640288 0.99640288 0.99283154]
mean value: 0.9956860318197055
key: test_roc_auc
value: [1. 0.9516129 1. 0.96774194 0.93548387 0.9516129
0.91935484 0.98387097 0.96666667 0.95053763]
mean value: 0.9626881720430108
key: train_roc_auc
value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288
0.99820144 0.99820144 0.99461721 0.99641577]
mean value: 0.996765168510353
key: test_jcc
value: [1. 0.90625 1. 0.93939394 0.87878788 0.90322581
0.84848485 0.96774194 0.93939394 0.90322581]
mean value: 0.9286504154447703
key: train_jcc
value: [0.99640288 0.99640288 0.99283154 0.99640288 0.98566308 0.99283154
0.99641577 0.99640288 0.98928571 0.99283154]
mean value: 0.9935470701779591
MCC on Blind test: 0.07
Accuracy on Blind test: 0.58
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.12759042 0.22521901 0.21887374 0.2211132 0.17997479 0.20122313
0.19672465 0.20488429 0.276335 0.25733685]
mean value: 0.21092751026153564
key: score_time
value: [0.01269174 0.02497721 0.02092695 0.02029276 0.01257658 0.0126636
0.01265192 0.02021074 0.02772164 0.02012014]
mean value: 0.0184833288192749
key: test_mcc
value: [0.90748521 0.61807005 0.7130241 0.80813523 0.77784447 0.77459667
0.61807005 0.80645161 0.57576971 0.70780713]
mean value: 0.7307254226729265
key: train_mcc
value: [0.87086426 0.86386843 0.84312418 0.83904739 0.85318614 0.85376169
0.85720277 0.84009387 0.86412027 0.86022912]
mean value: 0.8545498119930119
key: test_accuracy
value: [0.9516129 0.80645161 0.85483871 0.90322581 0.88709677 0.88709677
0.80645161 0.90322581 0.78688525 0.85245902]
mean value: 0.8639344262295082
key: train_accuracy
value: [0.9352518 0.93165468 0.92086331 0.91906475 0.92625899 0.92625899
0.92805755 0.91906475 0.93177738 0.92998205]
mean value: 0.9268234245637601
key: test_fscore
value: [0.94915254 0.81818182 0.86153846 0.90625 0.89230769 0.88888889
0.81818182 0.90322581 0.8 0.84210526]
mean value: 0.8679832291081068
key: train_fscore
value: [0.93617021 0.93286219 0.92307692 0.92091388 0.92768959 0.92819615
0.92982456 0.92173913 0.93286219 0.93097345]
mean value: 0.9284308286107671
key: test_precision
value: [1. 0.77142857 0.82352941 0.87878788 0.85294118 0.875
0.77142857 0.90322581 0.76470588 0.88888889]
mean value: 0.8529936187573759
key: train_precision
value: [0.92307692 0.91666667 0.89795918 0.90034364 0.9100346 0.90443686
0.90753425 0.89225589 0.91666667 0.91958042]
mean value: 0.9088555103251448
key: test_recall
value: [0.90322581 0.87096774 0.90322581 0.93548387 0.93548387 0.90322581
0.87096774 0.90322581 0.83870968 0.8 ]
mean value: 0.8864516129032258
key: train_recall
value: [0.94964029 0.94964029 0.94964029 0.94244604 0.94604317 0.95323741
0.95323741 0.95323741 0.94964029 0.94265233]
mean value: 0.9489414919677162
key: test_roc_auc
value: [0.9516129 0.80645161 0.85483871 0.90322581 0.88709677 0.88709677
0.80645161 0.90322581 0.78602151 0.8516129 ]
mean value: 0.8637634408602151
key: train_roc_auc
value: [0.9352518 0.93165468 0.92086331 0.91906475 0.92625899 0.92625899
0.92805755 0.91906475 0.93180939 0.92995926]
mean value: 0.9268243469740336
key: test_jcc
value: [0.90322581 0.69230769 0.75675676 0.82857143 0.80555556 0.8
0.69230769 0.82352941 0.66666667 0.72727273]
mean value: 0.7696193737654838
key: train_jcc
value: [0.88 0.87417219 0.85714286 0.8534202 0.86513158 0.86601307
0.86885246 0.85483871 0.87417219 0.87086093]
mean value: 0.8664604170132448
MCC on Blind test: 0.22
Accuracy on Blind test: 0.49
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.26798725 0.26752782 0.26489639 0.26649332 0.25984526 0.26162148
0.2612102 0.26817322 0.26268578 0.26635146]
mean value: 0.26467921733856203
key: score_time
value: [0.00845337 0.00842595 0.00839472 0.0083878 0.00851393 0.00833416
0.00913382 0.00835061 0.00875974 0.00896358]
mean value: 0.008571767807006836
key: test_mcc
value: [1. 0.90369611 1. 0.93743687 0.93743687 0.90748521
0.96824584 0.96824584 0.96770777 0.8688172 ]
mean value: 0.9459071710309553
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9516129 1. 0.96774194 0.96774194 0.9516129
0.98387097 0.98387097 0.98360656 0.93442623]
mean value: 0.9724484399788472
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95081967 1. 0.96875 0.96875 0.94915254
0.98412698 0.98360656 0.98412698 0.93333333]
mean value: 0.972266607346838
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96666667 1. 0.93939394 0.93939394 1.
0.96875 1. 0.96875 0.93333333]
mean value: 0.9716287878787879
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.93548387 1. 1. 1. 0.90322581
1. 0.96774194 1. 0.93333333]
mean value: 0.9739784946236559
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9516129 1. 0.96774194 0.96774194 0.9516129
0.98387097 0.98387097 0.98333333 0.9344086 ]
mean value: 0.9724193548387097
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90625 1. 0.93939394 0.93939394 0.90322581
0.96875 0.96774194 0.96875 0.875 ]
mean value: 0.9468505620723363
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.63
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01149559 0.01360273 0.01408195 0.01396298 0.0143764 0.01411986
0.01366615 0.01368761 0.01428699 0.01439714]
mean value: 0.013767743110656738
key: score_time
value: [0.01090598 0.01095629 0.01090288 0.01166439 0.01109648 0.01160717
0.01097107 0.01158404 0.01094365 0.01160645]
mean value: 0.011223840713500976
key: test_mcc
value: [0.3799803 0.51119863 0.54006172 0.74161985 0.56853524 0.56493268
0.50083542 0.43852901 0.72318666 0.76533557]
mean value: 0.5734215093600435
key: train_mcc
value: [0.4932785 0.76196204 0.69278522 0.72409686 0.56120987 0.54686874
0.76885315 0.49611447 0.76738608 0.73356387]
mean value: 0.6546118797369623
key: test_accuracy
value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548
0.74193548 0.66129032 0.85245902 0.86885246]
mean value: 0.759227921734532
key: train_accuracy
value: [0.69784173 0.87230216 0.82553957 0.8471223 0.75359712 0.73021583
0.87410072 0.69964029 0.87791741 0.85098743]
mean value: 0.8029264559626984
key: test_fscore
value: [0.73170732 0.77777778 0.78481013 0.87323944 0.69387755 0.79487179
0.77142857 0.74698795 0.86956522 0.88235294]
mean value: 0.7926618685748723
key: train_fscore
value: [0.76731302 0.88455285 0.85099846 0.86614173 0.68649886 0.78753541
0.88709677 0.76837725 0.88741722 0.87010955]
mean value: 0.8256041120420929
key: test_precision
value: [0.58823529 0.68292683 0.64583333 0.775 0.94444444 0.65957447
0.69230769 0.59615385 0.78947368 0.78947368]
mean value: 0.7163423276131415
key: train_precision
value: [0.62387387 0.80712166 0.74262735 0.77030812 0.94339623 0.64953271
0.80409357 0.62528217 0.82208589 0.77222222]
mean value: 0.756054378747134
key: test_recall
value: [0.96774194 0.90322581 1. 1. 0.5483871 1.
0.87096774 1. 0.96774194 1. ]
mean value: 0.9258064516129032
key: train_recall
value: [0.99640288 0.97841727 0.99640288 0.98920863 0.53956835 1.
0.98920863 0.99640288 0.96402878 0.99641577]
mean value: 0.9446056058379103
key: test_roc_auc
value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548
0.74193548 0.66129032 0.85053763 0.87096774]
mean value: 0.759247311827957
key: train_roc_auc
value: [0.69784173 0.87230216 0.82553957 0.8471223 0.75359712 0.73021583
0.87410072 0.69964029 0.87807174 0.85072587]
mean value: 0.8029157319305846
key: test_jcc
value: [0.57692308 0.63636364 0.64583333 0.775 0.53125 0.65957447
0.62790698 0.59615385 0.76923077 0.78947368]
mean value: 0.660770979104448
key: train_jcc
value: [0.62247191 0.79300292 0.74064171 0.76388889 0.52264808 0.64953271
0.79710145 0.62387387 0.79761905 0.7700831 ]
mean value: 0.7080863692848516
MCC on Blind test: 0.18
Accuracy on Blind test: 0.6
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02095389 0.03017879 0.03077292 0.03034782 0.03044152 0.03048611
0.03019238 0.03031898 0.0302968 0.03035188]
mean value: 0.02943410873413086
key: score_time
value: [0.0190351 0.02024627 0.02113628 0.01070428 0.01898575 0.01937699
0.02058935 0.01084757 0.0107224 0.01985407]
mean value: 0.017149806022644043
key: test_mcc
value: [0.96824584 0.81325006 0.83914639 0.87831007 0.96824584 0.93548387
0.90369611 0.93743687 0.80516731 0.8688172 ]
mean value: 0.8917799559713326
key: train_mcc
value: [0.93900081 0.93890359 0.91007783 0.9352518 0.92088714 0.92808157
0.91007783 0.92805755 0.92820949 0.93182991]
mean value: 0.9270377524969889
key: test_accuracy
value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194
0.9516129 0.96774194 0.90163934 0.93442623]
mean value: 0.9448968799576943
key: train_accuracy
value: [0.96942446 0.96942446 0.95503597 0.9676259 0.96043165 0.96402878
0.95503597 0.96402878 0.96409336 0.96588869]
mean value: 0.9635018017901656
key: test_fscore
value: [0.98412698 0.90909091 0.92063492 0.93939394 0.98412698 0.96774194
0.95081967 0.96666667 0.9 0.93333333]
mean value: 0.9455935344988755
key: train_fscore
value: [0.96969697 0.96958855 0.95495495 0.9676259 0.96057348 0.96415771
0.95495495 0.96402878 0.96389892 0.96613191]
mean value: 0.9635612113921358
key: test_precision
value: [0.96875 0.85714286 0.90625 0.88571429 0.96875 0.96774194
0.96666667 1. 0.93103448 0.93333333]
mean value: 0.9385383561099634
key: train_precision
value: [0.96113074 0.96441281 0.9566787 0.9676259 0.95714286 0.96071429
0.9566787 0.96402878 0.9673913 0.96099291]
mean value: 0.9616796985424773
key: test_recall
value: [1. 0.96774194 0.93548387 1. 1. 0.96774194
0.93548387 0.93548387 0.87096774 0.93333333]
mean value: 0.9546236559139785
key: train_recall
value: [0.97841727 0.97482014 0.95323741 0.9676259 0.96402878 0.9676259
0.95323741 0.96402878 0.96043165 0.97132616]
mean value: 0.9654779402284623
key: test_roc_auc
value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194
0.9516129 0.96774194 0.90215054 0.9344086 ]
mean value: 0.9449462365591399
key: train_roc_auc
value: [0.96942446 0.96942446 0.95503597 0.9676259 0.96043165 0.96402878
0.95503597 0.96402878 0.9640868 0.96587891]
mean value: 0.9635001676078492
key: test_jcc
value: [0.96875 0.83333333 0.85294118 0.88571429 0.96875 0.9375
0.90625 0.93548387 0.81818182 0.875 ]
mean value: 0.8981904484667768
key: train_jcc
value: [0.94117647 0.94097222 0.9137931 0.93728223 0.92413793 0.93079585
0.9137931 0.93055556 0.93031359 0.93448276]
mean value: 0.9297302811483933
MCC on Blind test: 0.23
Accuracy on Blind test: 0.47
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.20507646 0.19756031 0.19703841 0.19701099 0.20378065 0.19817948
0.19686937 0.19685602 0.19804025 0.20452499]
mean value: 0.1994936943054199
key: score_time
value: [0.01948881 0.02093601 0.01906753 0.02151203 0.02097845 0.01082182
0.02040362 0.01091933 0.02004528 0.01085353]
mean value: 0.017502641677856444
key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.90748521 0.96824584 0.96824584
0.96824584 0.93743687 0.87082935 0.83655914]
mean value: 0.9171654872995563
key: train_mcc
value: [0.94254361 0.94619622 0.94609826 0.94966486 0.94609826 0.94966486
0.93890359 0.95339163 0.95691189 0.9534734 ]
mean value: 0.948294657254694
key: test_accuracy
value: [0.98387097 0.91935484 0.9516129 0.9516129 0.98387097 0.98387097
0.98387097 0.96774194 0.93442623 0.91803279]
mean value: 0.9578265468006346
key: train_accuracy
value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014
0.96942446 0.97661871 0.97845601 0.97666068]
mean value: 0.9741087919610452
key: test_fscore
value: [0.98412698 0.92307692 0.95238095 0.95384615 0.98412698 0.98360656
0.98412698 0.96666667 0.93333333 0.91803279]
mean value: 0.9583324325947277
key: train_fscore
value: [0.97142857 0.97326203 0.97316637 0.97491039 0.97316637 0.97491039
0.96958855 0.97682709 0.97841727 0.97690941]
mean value: 0.9742586454574466
key: test_precision
value: [0.96875 0.88235294 0.9375 0.91176471 0.96875 1.
0.96875 1. 0.96551724 0.90322581]
mean value: 0.9506610694889747
key: train_precision
value: [0.96453901 0.96466431 0.96797153 0.97142857 0.96797153 0.97142857
0.96441281 0.96819788 0.97841727 0.96830986]
mean value: 0.9687341337990163
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.96774194
1. 0.93548387 0.90322581 0.93333333]
mean value: 0.9675268817204301
key: train_recall
value: [0.97841727 0.98201439 0.97841727 0.97841727 0.97841727 0.97841727
0.97482014 0.98561151 0.97841727 0.98566308]
mean value: 0.9798612722725046
key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129 0.9516129 0.98387097 0.98387097
0.98387097 0.96774194 0.93494624 0.91827957]
mean value: 0.9579032258064516
key: train_roc_auc
value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014
0.96942446 0.97661871 0.97845594 0.97664449]
mean value: 0.9741071658801991
key: test_jcc
value: [0.96875 0.85714286 0.90909091 0.91176471 0.96875 0.96774194
0.96875 0.93548387 0.875 0.84848485]
mean value: 0.921095912705258
key: train_jcc
value: [0.94444444 0.94791667 0.94773519 0.95104895 0.94773519 0.95104895
0.94097222 0.95470383 0.95774648 0.95486111]
mean value: 0.9498213041443461
MCC on Blind test: 0.2
Accuracy on Blind test: 0.44
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04743552 0.02349114 0.02685213 0.02750683 0.0252192 0.0279355
0.03556037 0.04037642 0.03911209 0.03533268]
mean value: 0.032882189750671385
key: score_time
value: [0.01078486 0.01099777 0.01306605 0.01077437 0.01067662 0.01066399
0.01071048 0.01073122 0.01087689 0.01084948]
mean value: 0.011013174057006836
key: test_mcc
value: [0.96824584 0.7130241 0.83914639 0.90748521 0.79471941 0.93548387
0.71004695 0.80813523 0.77096774 0.87082935]
mean value: 0.8318084093587729
key: train_mcc
value: [0.87424213 0.85278837 0.83904739 0.84537297 0.85265591 0.84192273
0.83904739 0.85646981 0.84627216 0.84586123]
mean value: 0.8493680080976538
key: test_accuracy
value: [0.98387097 0.85483871 0.91935484 0.9516129 0.88709677 0.96774194
0.85483871 0.90322581 0.8852459 0.93442623]
mean value: 0.9142252776308831
key: train_accuracy
value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331
0.91906475 0.92805755 0.92280072 0.92280072]
mean value: 0.9244882011805278
key: test_fscore
value: [0.98412698 0.86153846 0.92063492 0.95384615 0.89855072 0.96774194
0.85714286 0.9 0.8852459 0.93548387]
mean value: 0.9164311810018015
key: train_fscore
value: [0.93761141 0.92717584 0.92091388 0.92307692 0.92691622 0.92170819
0.92091388 0.92907801 0.92416226 0.92389381]
mean value: 0.9255450426062092
key: test_precision
value: [0.96875 0.82352941 0.90625 0.91176471 0.81578947 0.96774194
0.84375 0.93103448 0.9 0.90625 ]
mean value: 0.8974860009573761
key: train_precision
value: [0.92932862 0.91578947 0.90034364 0.91814947 0.91872792 0.91197183
0.90034364 0.91608392 0.90657439 0.91258741]
mean value: 0.9129900316323134
key: test_recall
value: [1. 0.90322581 0.93548387 1. 1. 0.96774194
0.87096774 0.87096774 0.87096774 0.96666667]
mean value: 0.9386021505376344
key: train_recall
value: [0.94604317 0.93884892 0.94244604 0.92805755 0.9352518 0.93165468
0.94244604 0.94244604 0.94244604 0.93548387]
mean value: 0.9385124158737526
key: test_roc_auc
value: [0.98387097 0.85483871 0.91935484 0.9516129 0.88709677 0.96774194
0.85483871 0.90322581 0.88548387 0.93494624]
mean value: 0.9143010752688172
key: train_roc_auc
value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331
0.91906475 0.92805755 0.92283592 0.92277791]
mean value: 0.9244894407055001
key: test_jcc
value: [0.96875 0.75675676 0.85294118 0.91176471 0.81578947 0.9375
0.75 0.81818182 0.79411765 0.87878788]
mean value: 0.8484589456822429
key: train_jcc
value: [0.88255034 0.86423841 0.8534202 0.85714286 0.86378738 0.85478548
0.8534202 0.86754967 0.85901639 0.85855263]
mean value: 0.8614463542047712
MCC on Blind test: 0.21
Accuracy on Blind test: 0.53
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.72030234 0.71688986 0.82023787 0.6914432 0.76275826 0.84668159
0.7217288 0.70235157 0.860708 0.69467163]
mean value: 0.7537773132324219
key: score_time
value: [0.01084113 0.01207328 0.01228642 0.01954889 0.0122242 0.01225781
0.01232028 0.0122869 0.01229262 0.01234746]
mean value: 0.012847900390625
key: test_mcc
value: [0.93743687 0.90369611 1. 0.90369611 0.87096774 0.93548387
0.90369611 0.87278605 0.93649139 0.87082935]
mean value: 0.9135083615653431
key: train_mcc
value: [0.95329292 0.94634322 0.95685929 0.95685929 0.97482645 0.96043787
0.93195016 0.96048758 0.96050901 0.98205307]
mean value: 0.9583618868215811
key: test_accuracy
value: [0.96774194 0.9516129 1. 0.9516129 0.93548387 0.96774194
0.9516129 0.93548387 0.96721311 0.93442623]
mean value: 0.9562929666842941
key: train_accuracy
value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583
0.96582734 0.98021583 0.98025135 0.99102334]
mean value: 0.9791418570708963
key: test_fscore
value: [0.96875 0.95238095 1. 0.95238095 0.93548387 0.96774194
0.95238095 0.93333333 0.96666667 0.93548387]
mean value: 0.9564602534562212
key: train_fscore
value: [0.97674419 0.97335702 0.97833935 0.97849462 0.98738739 0.98025135
0.96625222 0.980322 0.98025135 0.99102334]
mean value: 0.9792422819398573
key: test_precision
value: [0.93939394 0.9375 1. 0.9375 0.93548387 0.96774194
0.9375 0.96551724 1. 0.90625 ]
mean value: 0.9526886987224863
key: train_precision
value: [0.97153025 0.96140351 0.98188406 0.975 0.98916968 0.97849462
0.95438596 0.97508897 0.97849462 0.99280576]
mean value: 0.975825742653484
key: test_recall
value: [1. 0.96774194 1. 0.96774194 0.93548387 0.96774194
0.96774194 0.90322581 0.93548387 0.96666667]
mean value: 0.9611827956989247
key: train_recall
value: [0.98201439 0.98561151 0.97482014 0.98201439 0.98561151 0.98201439
0.97841727 0.98561151 0.98201439 0.98924731]
mean value: 0.9827376808230834
key: test_roc_auc
value: [0.96774194 0.9516129 1. 0.9516129 0.93548387 0.96774194
0.9516129 0.93548387 0.96774194 0.93494624]
mean value: 0.9563978494623656
key: train_roc_auc
value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583
0.96582734 0.98021583 0.98025451 0.99102653]
mean value: 0.9791424924576468
key: test_jcc
value: [0.93939394 0.90909091 1. 0.90909091 0.87878788 0.9375
0.90909091 0.875 0.93548387 0.87878788]
mean value: 0.9172226295210166
key: train_jcc
value: [0.95454545 0.94809689 0.95759717 0.95789474 0.97508897 0.96126761
0.9347079 0.96140351 0.96126761 0.98220641]
mean value: 0.959407624783067
MCC on Blind test: 0.14
Accuracy on Blind test: 0.35
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01091671 0.01010871 0.00867534 0.00845599 0.0082767 0.00828934
0.00830674 0.00770044 0.00745106 0.00766039]
mean value: 0.008584141731262207
key: score_time
value: [0.01369882 0.00904274 0.0089016 0.00860572 0.00860953 0.00857306
0.00860238 0.00795913 0.00795174 0.00807285]
mean value: 0.009001755714416504
key: test_mcc
value: [0.78446454 0.51856298 0.71004695 0.84266484 0.7190925 0.67883359
0.51639778 0.84266484 0.67204301 0.73763441]
mean value: 0.7022405434817621
key: train_mcc
value: [0.70405758 0.72340077 0.71605437 0.70505422 0.73033396 0.71230395
0.70505422 0.70180672 0.72391206 0.73070576]
mean value: 0.7152683609552583
key: test_accuracy
value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968
0.75806452 0.91935484 0.83606557 0.86885246]
mean value: 0.8495240613432047
key: train_accuracy
value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511
0.85251799 0.85071942 0.86175943 0.86535009]
mean value: 0.8568836133965358
key: test_fscore
value: [0.89552239 0.76923077 0.85714286 0.92307692 0.86567164 0.84375
0.75409836 0.91525424 0.83870968 0.86666667]
mean value: 0.852912352133119
key: train_fscore
value: [0.85901639 0.86371681 0.85968028 0.85304659 0.86631016 0.85714286
0.85304659 0.85309735 0.86371681 0.86535009]
mean value: 0.8594123948387209
key: test_precision
value: [0.83333333 0.73529412 0.84375 0.88235294 0.80555556 0.81818182
0.76666667 0.96428571 0.83870968 0.86666667]
mean value: 0.8354796490932639
key: train_precision
value: [0.78915663 0.85017422 0.84912281 0.85 0.85865724 0.85106383
0.85 0.83972125 0.85017422 0.86690647]
mean value: 0.845497666835835
key: test_recall
value: [0.96774194 0.80645161 0.87096774 0.96774194 0.93548387 0.87096774
0.74193548 0.87096774 0.83870968 0.86666667]
mean value: 0.8737634408602151
key: train_recall
value: [0.94244604 0.87769784 0.8705036 0.85611511 0.87410072 0.86330935
0.85611511 0.86690647 0.87769784 0.86379928]
mean value: 0.8748691369485058
key: test_roc_auc
value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968
0.75806452 0.91935484 0.83602151 0.8688172 ]
mean value: 0.8495161290322581
key: train_roc_auc
value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511
0.85251799 0.85071942 0.86178799 0.86535288]
mean value: 0.8568867486655837
key: test_jcc
value: [0.81081081 0.625 0.75 0.85714286 0.76315789 0.72972973
0.60526316 0.84375 0.72222222 0.76470588]
mean value: 0.747178255489014
key: train_jcc
value: [0.75287356 0.76012461 0.75389408 0.74375 0.76415094 0.75
0.74375 0.74382716 0.76012461 0.76265823]
mean value: 0.7535153197137231
MCC on Blind test: 0.21
Accuracy on Blind test: 0.57
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00817347 0.00792408 0.00842571 0.00832367 0.00831985 0.00922894
0.00865865 0.00853586 0.00858712 0.00871468]
mean value: 0.008489203453063966
key: score_time
value: [0.0080893 0.00804472 0.00845718 0.00870252 0.00849962 0.00911498
0.00868034 0.00858021 0.00858331 0.00869846]
mean value: 0.00854506492614746
key: test_mcc
value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695
0.42023032 0.74193548 0.54251915 0.57419355]
mean value: 0.5943068479116385
key: train_mcc
value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441
0.65528703 0.62262853 0.64839945 0.64106733]
mean value: 0.6269854139141487
key: test_accuracy
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
0.70967742 0.87096774 0.7704918 0.78688525]
mean value: 0.7960602855631941
key: train_accuracy
value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
0.82733813 0.81115108 0.82405745 0.82046679]
mean value: 0.8133732870077367
key: test_fscore
value: [0.79310345 0.8358209 0.71186441 0.85245902 0.76923077 0.85245902
0.71875 0.87096774 0.76666667 0.78688525]
mean value: 0.7958207207099356
key: train_fscore
value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942 0.79927667
0.83098592 0.81415929 0.82624113 0.82269504]
mean value: 0.814551814377158
key: test_precision
value: [0.85185185 0.77777778 0.75 0.86666667 0.73529412 0.86666667
0.6969697 0.87096774 0.79310345 0.77419355]
mean value: 0.7983491516178162
key: train_precision
value: [0.7972028 0.80622837 0.81985294 0.80363636 0.81751825 0.80363636
0.8137931 0.80139373 0.81468531 0.81403509]
mean value: 0.8091982321605484
key: test_recall
value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968
0.74193548 0.87096774 0.74193548 0.8 ]
mean value: 0.7961290322580645
key: train_recall
value: [0.82014388 0.8381295 0.80215827 0.79496403 0.8057554 0.79496403
0.84892086 0.82733813 0.8381295 0.83154122]
mean value: 0.8202044815760295
key: test_roc_auc
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
0.70967742 0.87096774 0.77096774 0.78709677]
mean value: 0.7961290322580645
key: train_roc_auc
value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
0.82733813 0.81115108 0.82408267 0.82044687]
mean value: 0.8133738170753719
key: test_jcc
value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625 0.74285714
0.56097561 0.77142857 0.62162162 0.64864865]
mean value: 0.6641111891208169
key: train_jcc
value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265
0.71084337 0.68656716 0.70392749 0.69879518]
mean value: 0.6872518746851146
MCC on Blind test: 0.18
Accuracy on Blind test: 0.52
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00818872 0.00765157 0.00800991 0.00798917 0.00798535 0.00793529
0.00807238 0.00833344 0.00816226 0.00822878]
mean value: 0.008055686950683594
key: score_time
value: [0.01362538 0.01161528 0.01154208 0.01181722 0.01149821 0.01183558
0.01181483 0.01180124 0.01576948 0.01188374]
mean value: 0.012320303916931152
key: test_mcc
value: [0.7130241 0.61418277 0.5483871 0.77459667 0.51856298 0.74348441
0.58834841 0.61807005 0.60818119 0.57576971]
mean value: 0.6302607385125394
key: train_mcc
value: [0.7014797 0.74464768 0.73388892 0.71949894 0.75180343 0.71341277
0.73033396 0.70918848 0.73474672 0.73420349]
mean value: 0.7273204091578028
key: test_accuracy
value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774
0.79032258 0.80645161 0.80327869 0.78688525]
mean value: 0.8138551031200423
key: train_accuracy
value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511
0.86510791 0.85431655 0.86714542 0.86535009]
mean value: 0.8633574648360306
key: test_fscore
value: [0.84745763 0.8125 0.77419355 0.8852459 0.76923077 0.86666667
0.80597015 0.79310345 0.8 0.77192982]
mean value: 0.8126297935133517
key: train_fscore
value: [0.84990958 0.8716094 0.86594203 0.85869565 0.87567568 0.85185185
0.86388385 0.85137615 0.86446886 0.85875706]
mean value: 0.8612170116983378
key: test_precision
value: [0.89285714 0.78787879 0.77419355 0.9 0.73529412 0.89655172
0.75 0.85185185 0.82758621 0.81481481]
mean value: 0.8231028194471236
key: train_precision
value: [0.85454545 0.87636364 0.87226277 0.8649635 0.87725632 0.8778626
0.87179487 0.86891386 0.88059701 0.9047619 ]
mean value: 0.8749321930550784
key: test_recall
value: [0.80645161 0.83870968 0.77419355 0.87096774 0.80645161 0.83870968
0.87096774 0.74193548 0.77419355 0.73333333]
mean value: 0.8055913978494623
key: train_recall
value: [0.84532374 0.86690647 0.85971223 0.85251799 0.87410072 0.82733813
0.85611511 0.83453237 0.84892086 0.8172043 ]
mean value: 0.848267192697455
key: test_roc_auc
value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774
0.79032258 0.80645161 0.80376344 0.78602151]
mean value: 0.8138172043010753
key: train_roc_auc
value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511
0.86510791 0.85431655 0.86711276 0.86543668]
mean value: 0.8633628581006163
key: test_jcc
value: [0.73529412 0.68421053 0.63157895 0.79411765 0.625 0.76470588
0.675 0.65714286 0.66666667 0.62857143]
mean value: 0.6862288073123987
key: train_jcc
value: [0.73899371 0.7724359 0.76357827 0.75238095 0.77884615 0.74193548
0.76038339 0.74121406 0.76129032 0.75247525]
mean value: 0.7563533487181033
MCC on Blind test: 0.16
Accuracy on Blind test: 0.57
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01835275 0.01664305 0.01745725 0.01704788 0.01895404 0.01934195
0.01825833 0.01938081 0.01911664 0.01921248]
mean value: 0.0183765172958374
key: score_time
value: [0.00940108 0.01006126 0.00929928 0.00980639 0.01032066 0.01062155
0.01041937 0.01055121 0.01047111 0.01046944]
mean value: 0.010142135620117187
key: test_mcc
value: [0.96824584 0.66226618 0.62471615 0.7190925 0.7284928 0.80813523
0.50083542 0.80645161 0.63939757 0.81978229]
mean value: 0.7277415590359753
key: train_mcc
value: [0.85345163 0.77632088 0.79541168 0.777078 0.76906554 0.75930753
0.79995316 0.76580581 0.77932355 0.78519796]
mean value: 0.78609157351081
key: test_accuracy
value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581
0.74193548 0.90322581 0.81967213 0.90163934]
mean value: 0.859227921734532
key: train_accuracy
value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928
0.89748201 0.8794964 0.88689408 0.89048474]
mean value: 0.890327809565633
key: test_fscore
value: [0.98412698 0.84057971 0.82352941 0.86567164 0.86956522 0.90625
0.77142857 0.90322581 0.82539683 0.90909091]
mean value: 0.8698865077586886
key: train_fscore
value: [0.92794376 0.89189189 0.90068493 0.89225589 0.88851351 0.88403361
0.90289608 0.88701518 0.89303905 0.89608177]
mean value: 0.8964355683391803
key: test_precision
value: [0.96875 0.76315789 0.75675676 0.80555556 0.78947368 0.87878788
0.69230769 0.90322581 0.8125 0.83333333]
mean value: 0.8203848602140198
key: train_precision
value: [0.90721649 0.84076433 0.85947712 0.83860759 0.83757962 0.829653
0.85760518 0.83492063 0.84565916 0.8538961 ]
mean value: 0.8505379240652493
key: test_recall
value: [1. 0.93548387 0.90322581 0.93548387 0.96774194 0.93548387
0.87096774 0.90322581 0.83870968 1. ]
mean value: 0.9290322580645161
key: train_recall
value: [0.94964029 0.94964029 0.94604317 0.95323741 0.94604317 0.94604317
0.95323741 0.94604317 0.94604317 0.94265233]
mean value: 0.9478623552770686
key: test_roc_auc
value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581
0.74193548 0.90322581 0.81935484 0.90322581]
mean value: 0.8593548387096774
key: train_roc_auc
value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928
0.89748201 0.8794964 0.88700008 0.89039091]
mean value: 0.8903290271008999
key: test_jcc
value: [0.96875 0.725 0.7 0.76315789 0.76923077 0.82857143
0.62790698 0.82352941 0.7027027 0.83333333]
mean value: 0.7742182517083968
key: train_jcc
value: [0.86557377 0.80487805 0.81931464 0.80547112 0.7993921 0.79216867
0.82298137 0.7969697 0.80674847 0.8117284 ]
mean value: 0.8125226282348854
MCC on Blind test: 0.26
Accuracy on Blind test: 0.5
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.65558004 1.55461335 1.68518972 1.59982467 1.59840369 1.78556776
1.98289418 1.71361303 1.71010733 1.56034899]
mean value: 1.6846142768859864
key: score_time
value: [0.01405716 0.02408385 0.01391459 0.01108027 0.01359916 0.01913881
0.01201797 0.01144147 0.01147699 0.01196384]
mean value: 0.014277410507202149
key: test_mcc
value: [1. 0.90369611 0.93548387 0.96824584 0.93743687 0.90369611
0.93548387 0.93743687 0.87082935 0.90215054]
mean value: 0.9294459430210258
key: train_mcc
value: [0.99283145 0.98561151 0.99283145 0.98921503 0.99283145 0.98921503
0.98202074 0.99640932 0.99284416 0.99641577]
mean value: 0.9910225917811445
key: test_accuracy
value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.9516129
0.96774194 0.96774194 0.93442623 0.95081967]
mean value: 0.9643310417768377
key: train_accuracy
value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432
0.99100719 0.99820144 0.99640934 0.99820467]
mean value: 0.9955045658266923
key: test_fscore
value: [1. 0.95238095 0.96774194 0.98412698 0.96875 0.95081967
0.96774194 0.96666667 0.93333333 0.95081967]
mean value: 0.9642381151737973
key: train_fscore
value: [0.99638989 0.99280576 0.99638989 0.99459459 0.99638989 0.99459459
0.99102334 0.9981982 0.99638989 0.99820467]
mean value: 0.9954980716751404
key: test_precision
value: [1. 0.9375 0.96774194 0.96875 0.93939394 0.96666667
0.96774194 1. 0.96551724 0.93548387]
mean value: 0.9648795589375401
key: train_precision
value: [1. 0.99280576 1. 0.99638989 1. 0.99638989
0.98924731 1. 1. 1. ]
mean value: 0.9974832850617142
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.93548387
0.96774194 0.93548387 0.90322581 0.96666667]
mean value: 0.9644086021505376
key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
0.99280576 0.99640288 0.99280576 0.99641577]
mean value: 0.9935264691472628
key: test_roc_auc
value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.9516129
0.96774194 0.96774194 0.93494624 0.95107527]
mean value: 0.9644086021505377
key: train_roc_auc
value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432
0.99100719 0.99820144 0.99640288 0.99820789]
mean value: 0.995504241767876
key: test_jcc
value: [1. 0.90909091 0.9375 0.96875 0.93939394 0.90625
0.9375 0.93548387 0.875 0.90625 ]
mean value: 0.931521871945259
key: train_jcc
value: [0.99280576 0.98571429 0.99280576 0.98924731 0.99280576 0.98924731
0.98220641 0.99640288 0.99280576 0.99641577]
mean value: 0.9910456984954045
MCC on Blind test: 0.15
Accuracy on Blind test: 0.35
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01399827 0.01331663 0.01154613 0.01056981 0.01088071 0.00967002
0.00976062 0.00975561 0.0097971 0.00941205]
mean value: 0.010870695114135742
key: score_time
value: [0.01116037 0.00968766 0.00949836 0.00883269 0.00868964 0.00786233
0.00786996 0.00779343 0.0077889 0.0078299 ]
mean value: 0.008701324462890625
key: test_mcc
value: [1. 0.87096774 1. 0.96824584 0.90369611 0.87831007
0.87831007 0.96824584 0.96774194 0.90215054]
mean value: 0.9337668133579895
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93548387 1. 0.98387097 0.9516129 0.93548387
0.93548387 0.98387097 0.98360656 0.95081967]
mean value: 0.9660232681121099
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93548387 1. 0.98412698 0.95238095 0.93103448
0.93103448 0.98360656 0.98360656 0.95081967]
mean value: 0.9652093559878165
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93548387 1. 0.96875 0.9375 1.
1. 1. 1. 0.93548387]
mean value: 0.9777217741935483
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.93548387 1. 1. 0.96774194 0.87096774
0.87096774 0.96774194 0.96774194 0.96666667]
mean value: 0.9547311827956989
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93548387 1. 0.98387097 0.9516129 0.93548387
0.93548387 0.98387097 0.98387097 0.95107527]
mean value: 0.9660752688172043
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.87878788 1. 0.96875 0.90909091 0.87096774
0.87096774 0.96774194 0.96774194 0.90625 ]
mean value: 0.9340298142717498
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.01
Accuracy on Blind test: 0.2
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10142112 0.10165668 0.10126591 0.10151219 0.11299324 0.11322856
0.108289 0.1019156 0.10539865 0.1026175 ]
mean value: 0.10502984523773193
key: score_time
value: [0.0171802 0.0173862 0.01719642 0.01748347 0.01896811 0.01896906
0.01711893 0.01859283 0.01831841 0.01735854]
mean value: 0.01785721778869629
key: test_mcc
value: [1. 0.90369611 0.93548387 0.93548387 0.93743687 0.93548387
0.93743687 0.96824584 0.96770777 0.90215054]
mean value: 0.9423125607021228
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9516129 0.96774194 0.96774194 0.96774194 0.96774194
0.96774194 0.98387097 0.98360656 0.95081967]
mean value: 0.9708619777895293
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 0.96774194 0.96774194 0.96875 0.96774194
0.96666667 0.98360656 0.98412698 0.95081967]
mean value: 0.9709576639134413
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 0.96774194 0.96774194 0.93939394 0.96774194
1. 1. 0.96875 0.93548387]
mean value: 0.9684353616813295
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 0.96774194 0.96774194 1. 0.96774194
0.93548387 0.96774194 1. 0.96666667]
mean value: 0.9740860215053764
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9516129 0.96774194 0.96774194 0.96774194 0.96774194
0.96774194 0.98387097 0.98333333 0.95107527]
mean value: 0.9708602150537635
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 0.9375 0.9375 0.93939394 0.9375
0.93548387 0.96774194 0.96875 0.90625 ]
mean value: 0.9439210654936462
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.36
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00819397 0.00782299 0.00878263 0.00824928 0.007725 0.00857353
0.00780129 0.00821066 0.008883 0.00848746]
mean value: 0.008272981643676758
key: score_time
value: [0.00791669 0.00839043 0.00856495 0.00861716 0.00864053 0.00859261
0.00863576 0.00865197 0.00859213 0.00803781]
mean value: 0.00846400260925293
key: test_mcc
value: [0.81325006 0.82199494 0.83914639 0.90369611 0.87096774 0.90369611
0.81325006 0.7284928 0.74460444 0.80475071]
mean value: 0.8243849367718851
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90322581 0.90322581 0.91935484 0.9516129 0.93548387 0.9516129
0.90322581 0.85483871 0.86885246 0.90163934]
mean value: 0.9093072448439978
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.89655172 0.89285714 0.91803279 0.95238095 0.93548387 0.95081967
0.89655172 0.83636364 0.86206897 0.89655172]
mean value: 0.9037662199516902
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 1. 0.93333333 0.9375 0.93548387 0.96666667
0.96296296 0.95833333 0.92592593 0.92857143]
mean value: 0.9511740484724356
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83870968 0.80645161 0.90322581 0.96774194 0.93548387 0.93548387
0.83870968 0.74193548 0.80645161 0.86666667]
mean value: 0.8640860215053763
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90322581 0.90322581 0.91935484 0.9516129 0.93548387 0.9516129
0.90322581 0.85483871 0.86989247 0.90107527]
mean value: 0.9093548387096775
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8125 0.80645161 0.84848485 0.90909091 0.87878788 0.90625
0.8125 0.71875 0.75757576 0.8125 ]
mean value: 0.826289100684262
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.26
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.33802533 1.34034443 1.34689403 1.33327985 1.32815957 1.34741807
1.36258483 1.35150194 1.35220432 1.37553906]
mean value: 1.3475951433181763
key: score_time
value: [0.09532094 0.15330195 0.09112287 0.0915432 0.09900188 0.09554839
0.09749842 0.0989244 0.09722352 0.09352469]
mean value: 0.10130102634429931
key: test_mcc
value: [1. 0.90369611 0.96824584 0.96824584 0.93743687 0.96824584
1. 0.96824584 0.96770777 0.8688172 ]
mean value: 0.9550641303879139
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9516129 0.98387097 0.98387097 0.96774194 0.98387097
1. 0.98387097 0.98360656 0.93442623]
mean value: 0.9772871496562665
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 0.98412698 0.98412698 0.96875 0.98360656
1. 0.98360656 0.98412698 0.93333333]
mean value: 0.9774058352849336
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 0.96875 0.96875 0.93939394 1.
1. 1. 0.96875 0.93333333]
mean value: 0.9716477272727273
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
1. 0.96774194 1. 0.93333333]
mean value: 0.9836559139784946
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9516129 0.98387097 0.98387097 0.96774194 0.98387097
1. 0.98387097 0.98333333 0.9344086 ]
mean value: 0.9772580645161291
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 0.96875 0.96875 0.93939394 0.96774194
1. 0.96774194 0.96875 0.875 ]
mean value: 0.9565218719452591
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.19
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.89337206 0.89832807 0.98853326 0.97143817 0.95260191 0.94139814
0.88369918 0.9011116 0.89748955 0.93211508]
mean value: 0.9260087013244629
key: score_time
value: [0.21353126 0.18774438 0.24319863 0.28244352 0.24419403 0.22724342
0.2297473 0.25225329 0.2684927 0.27329707]
mean value: 0.24221456050872803
key: test_mcc
value: [0.96824584 0.84266484 0.96824584 0.93743687 0.93743687 0.96824584
1. 0.93548387 0.96770777 0.8688172 ]
mean value: 0.9394284931358869
key: train_mcc
value: [0.96073627 0.95025527 0.97124816 0.96058703 0.96768225 0.95693359
0.96412858 0.96778244 0.95713569 0.97137405]
mean value: 0.9627863336198357
key: test_accuracy
value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097
1. 0.96774194 0.98360656 0.93442623]
mean value: 0.9692226335272343
key: train_accuracy
value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727
0.98201439 0.98381295 0.97845601 0.98563734]
mean value: 0.9813014220580447
key: test_fscore
value: [0.98412698 0.92307692 0.98412698 0.96875 0.96875 0.98360656
1. 0.96774194 0.98412698 0.93333333]
mean value: 0.9697639701652129
key: train_fscore
value: [0.98046181 0.97526502 0.98566308 0.98039216 0.98389982 0.97857143
0.98214286 0.98395722 0.97864769 0.98576512]
mean value: 0.9814766206153425
key: test_precision
value: [0.96875 0.88235294 0.96875 0.93939394 0.93939394 1.
1. 0.96774194 0.96875 0.93333333]
mean value: 0.9568466088781554
key: train_precision
value: [0.96842105 0.95833333 0.98214286 0.97173145 0.97864769 0.97163121
0.9751773 0.97526502 0.96830986 0.97879859]
mean value: 0.972845835273727
key: test_recall
value: [1. 0.96774194 1. 1. 1. 0.96774194
1. 0.96774194 1. 0.93333333]
mean value: 0.9836559139784946
key: train_recall
value: [0.99280576 0.99280576 0.98920863 0.98920863 0.98920863 0.98561151
0.98920863 0.99280576 0.98920863 0.99283154]
mean value: 0.9902903483664681
key: test_roc_auc
value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097
1. 0.96774194 0.98333333 0.9344086 ]
mean value: 0.9691935483870968
key: train_roc_auc
value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727
0.98201439 0.98381295 0.97847528 0.9856244 ]
mean value: 0.9813020551300895
key: test_jcc
value: [0.96875 0.85714286 0.96875 0.93939394 0.93939394 0.96774194
1. 0.9375 0.96875 0.875 ]
mean value: 0.9422422671414608
key: train_jcc
value: [0.96167247 0.95172414 0.97173145 0.96153846 0.96830986 0.95804196
0.96491228 0.96842105 0.95818815 0.97192982]
mean value: 0.9636469650502072
MCC on Blind test: 0.09
Accuracy on Blind test: 0.2
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02149725 0.00856233 0.00855422 0.00861883 0.00868273 0.00835061
0.00852489 0.00826049 0.00861621 0.008636 ]
mean value: 0.009830355644226074
key: score_time
value: [0.0092957 0.00866055 0.00866151 0.00840211 0.00860667 0.00837517
0.00868773 0.00860476 0.00860882 0.00852346]
mean value: 0.00864264965057373
key: test_mcc
value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695
0.42023032 0.74193548 0.54251915 0.57419355]
mean value: 0.5943068479116385
key: train_mcc
value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441
0.65528703 0.62262853 0.64839945 0.64106733]
mean value: 0.6269854139141487
key: test_accuracy
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
0.70967742 0.87096774 0.7704918 0.78688525]
mean value: 0.7960602855631941
key: train_accuracy
value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
0.82733813 0.81115108 0.82405745 0.82046679]
mean value: 0.8133732870077367
key: test_fscore
value: [0.79310345 0.8358209 0.71186441 0.85245902 0.76923077 0.85245902
0.71875 0.87096774 0.76666667 0.78688525]
mean value: 0.7958207207099356
key: train_fscore
value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942 0.79927667
0.83098592 0.81415929 0.82624113 0.82269504]
mean value: 0.814551814377158
key: test_precision
value: [0.85185185 0.77777778 0.75 0.86666667 0.73529412 0.86666667
0.6969697 0.87096774 0.79310345 0.77419355]
mean value: 0.7983491516178162
key: train_precision
value: [0.7972028 0.80622837 0.81985294 0.80363636 0.81751825 0.80363636
0.8137931 0.80139373 0.81468531 0.81403509]
mean value: 0.8091982321605484
key: test_recall
value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968
0.74193548 0.87096774 0.74193548 0.8 ]
mean value: 0.7961290322580645
key: train_recall
value: [0.82014388 0.8381295 0.80215827 0.79496403 0.8057554 0.79496403
0.84892086 0.82733813 0.8381295 0.83154122]
mean value: 0.8202044815760295
key: test_roc_auc
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
0.70967742 0.87096774 0.77096774 0.78709677]
mean value: 0.7961290322580645
key: train_roc_auc
value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
0.82733813 0.81115108 0.82408267 0.82044687]
mean value: 0.8133738170753719
key: test_jcc
value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625 0.74285714
0.56097561 0.77142857 0.62162162 0.64864865]
mean value: 0.6641111891208169
key: train_jcc
value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265
0.71084337 0.68656716 0.70392749 0.69879518]
mean value: 0.6872518746851146
MCC on Blind test: 0.18
Accuracy on Blind test: 0.52
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09717035 0.05194712 0.05566955 0.05653906 0.06008887 0.0614419
0.06166148 0.06023955 0.06417036 0.05456114]
mean value: 0.06234893798828125
key: score_time
value: [0.01015568 0.00965595 0.00964165 0.00960851 0.00993562 0.00997877
0.01027107 0.00972724 0.00962043 0.00961185]
mean value: 0.009820675849914551
key: test_mcc
value: [1. 0.90369611 0.93548387 0.96824584 0.93743687 0.93743687
1. 0.96824584 0.90586325 0.8688172 ]
mean value: 0.942522584980111
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.96774194
1. 0.98387097 0.95081967 0.93442623]
mean value: 0.9707826546800635
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.95238095 0.96774194 0.98412698 0.96875 0.96666667
1. 0.98360656 0.95384615 0.93333333]
mean value: 0.971045258321501
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.9375 0.96774194 0.96875 0.93939394 1.
1. 1. 0.91176471 0.93333333]
mean value: 0.9658483914093496
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96774194 0.96774194 1. 1. 0.93548387
1. 0.96774194 1. 0.93333333]
mean value: 0.9772043010752688
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.96774194
1. 0.98387097 0.95 0.9344086 ]
mean value: 0.9706989247311828
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.90909091 0.9375 0.96875 0.93939394 0.93548387
1. 0.96774194 0.91176471 0.875 ]
mean value: 0.9444725360818814
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.2
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01567197 0.04181266 0.04561234 0.0498898 0.04169583 0.04362798
0.04267979 0.04328203 0.0497613 0.0470922 ]
mean value: 0.04211258888244629
key: score_time
value: [0.01037931 0.02170181 0.01888394 0.01488948 0.01078224 0.02180099
0.02077007 0.0193584 0.0108037 0.01081157]
mean value: 0.016018152236938477
key: test_mcc
value: [0.93548387 0.84266484 0.93548387 0.93743687 0.87278605 1.
0.96824584 0.87278605 0.9344086 0.8688172 ]
mean value: 0.9168113188472994
key: train_mcc
value: [0.93914669 0.94653932 0.93563929 0.93914669 0.94266562 0.93195016
0.93238486 0.94283651 0.93575728 0.9427658 ]
mean value: 0.9388832217176918
key: test_accuracy
value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1.
0.98387097 0.93548387 0.96721311 0.93442623]
mean value: 0.9579058699101005
key: train_accuracy
value: [0.96942446 0.97302158 0.9676259 0.96942446 0.97122302 0.96582734
0.96582734 0.97122302 0.96768402 0.97127469]
mean value: 0.969255582966302
key: test_fscore
value: [0.96774194 0.92307692 0.96774194 0.96875 0.9375 1.
0.98412698 0.93333333 0.96774194 0.93333333]
mean value: 0.9583346380322186
key: train_fscore
value: [0.96980462 0.97345133 0.96808511 0.96980462 0.97153025 0.96625222
0.9664903 0.97163121 0.96808511 0.97163121]
mean value: 0.9696765956964183
key: test_precision
value: [0.96774194 0.88235294 0.96774194 0.93939394 0.90909091 1.
0.96875 0.96551724 0.96774194 0.93333333]
mean value: 0.9501664170825576
key: train_precision
value: [0.95789474 0.95818815 0.95454545 0.95789474 0.96126761 0.95438596
0.94809689 0.95804196 0.95454545 0.96140351]
mean value: 0.9566264459258345
key: test_recall
value: [0.96774194 0.96774194 0.96774194 1. 0.96774194 1.
1. 0.90322581 0.96774194 0.93333333]
mean value: 0.9675268817204301
key: train_recall
value: [0.98201439 0.98920863 0.98201439 0.98201439 0.98201439 0.97841727
0.98561151 0.98561151 0.98201439 0.98207885]
mean value: 0.9830999716355947
key: test_roc_auc
value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1.
0.98387097 0.93548387 0.9672043 0.9344086 ]
mean value: 0.9579032258064516
key: train_roc_auc
value: [0.96942446 0.97302158 0.9676259 0.96942446 0.97122302 0.96582734
0.96582734 0.97122302 0.9677097 0.97125525]
mean value: 0.9692562079368764
key: test_jcc
value: [0.9375 0.85714286 0.9375 0.93939394 0.88235294 1.
0.96875 0.875 0.9375 0.875 ]
mean value: 0.9210139737713268
key: train_jcc
value: [0.94137931 0.94827586 0.93814433 0.94137931 0.94463668 0.9347079
0.93515358 0.94482759 0.93814433 0.94482759]
mean value: 0.9411476480564737
MCC on Blind test: 0.14
Accuracy on Blind test: 0.35
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02284217 0.00781918 0.00835109 0.00833607 0.00749111 0.00754762
0.00822759 0.00809884 0.00830936 0.00828028]
mean value: 0.009530329704284668
key: score_time
value: [0.00877237 0.00818181 0.00863433 0.00789332 0.00792742 0.00775814
0.00851727 0.00839686 0.00838804 0.00852823]
mean value: 0.008299779891967774
key: test_mcc
value: [0.74193548 0.55301004 0.55895656 0.69047575 0.60677988 0.80813523
0.46358632 0.77459667 0.57576971 0.75310667]
mean value: 0.6526352311777236
key: train_mcc
value: [0.67282515 0.67609995 0.67144111 0.65172831 0.66087942 0.64772254
0.68595876 0.65901019 0.68263871 0.65745214]
mean value: 0.6665756264215859
key: test_accuracy
value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581
0.72580645 0.88709677 0.78688525 0.86885246]
mean value: 0.8220253833950291
key: train_accuracy
value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388
0.83992806 0.82553957 0.83842011 0.82585278]
mean value: 0.8299164976815675
key: test_fscore
value: [0.87096774 0.78787879 0.79411765 0.85294118 0.81690141 0.90625
0.75362319 0.88888889 0.8 0.87878788]
mean value: 0.8350356717876952
key: train_fscore
value: [0.84422111 0.84563758 0.84317032 0.83472454 0.83838384 0.83277592
0.84991568 0.83806344 0.84797297 0.83697479]
mean value: 0.8411840193764768
key: test_precision
value: [0.87096774 0.74285714 0.72972973 0.78378378 0.725 0.87878788
0.68421053 0.875 0.76470588 0.80555556]
mean value: 0.7860598241318305
key: train_precision
value: [0.78996865 0.79245283 0.79365079 0.7788162 0.78797468 0.778125
0.8 0.78193146 0.79936306 0.78797468]
mean value: 0.789025736384194
key: test_recall
value: [0.87096774 0.83870968 0.87096774 0.93548387 0.93548387 0.93548387
0.83870968 0.90322581 0.83870968 0.96666667]
mean value: 0.8934408602150538
key: train_recall
value: [0.90647482 0.90647482 0.89928058 0.89928058 0.89568345 0.89568345
0.90647482 0.9028777 0.9028777 0.89247312]
mean value: 0.9007581031948635
key: test_roc_auc
value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581
0.72580645 0.88709677 0.78602151 0.87043011]
mean value: 0.8220967741935484
key: train_roc_auc
value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388
0.83992806 0.82553957 0.83853562 0.82573296]
mean value: 0.829916067146283
key: test_jcc
value: [0.77142857 0.65 0.65853659 0.74358974 0.69047619 0.82857143
0.60465116 0.8 0.66666667 0.78378378]
mean value: 0.7197704132672936
key: train_jcc
value: [0.73043478 0.73255814 0.72886297 0.71633238 0.72173913 0.71346705
0.73900293 0.72126437 0.73607038 0.71965318]
mean value: 0.7259385314063227
MCC on Blind test: 0.21
Accuracy on Blind test: 0.5
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01066303 0.01422668 0.01254988 0.01286054 0.01497436 0.01457095
0.01441455 0.01421928 0.01397729 0.01655555]
mean value: 0.013901209831237793
key: score_time
value: [0.00798249 0.01007485 0.01002264 0.01035452 0.01045942 0.01085925
0.01050234 0.01053119 0.01049948 0.01047754]
mean value: 0.010176372528076173
key: test_mcc
value: [0.84983659 0.90369611 0.96824584 0.93743687 0.83914639 1.
0.93743687 0.84266484 0.9344086 0.90215054]
mean value: 0.9115022641468933
key: train_mcc
value: [0.85210391 0.96048758 0.90882979 0.95324358 0.8782527 0.93534863
0.935276 0.91827075 0.93969601 0.97130001]
mean value: 0.9252808948765114
key: test_accuracy
value: [0.91935484 0.9516129 0.98387097 0.96774194 0.91935484 1.
0.96774194 0.91935484 0.96721311 0.95081967]
mean value: 0.9547065044949762
key: train_accuracy
value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259
0.9676259 0.95863309 0.96947935 0.98563734]
mean value: 0.9618785761337071
key: test_fscore
value: [0.9122807 0.95238095 0.98412698 0.96875 0.91803279 1.
0.96666667 0.91525424 0.96774194 0.95081967]
mean value: 0.9536053936717389
key: train_fscore
value: [0.91746641 0.980322 0.95486111 0.97666068 0.93383743 0.96785714
0.96750903 0.95764273 0.97001764 0.98561151]
mean value: 0.961178567797733
key: test_precision
value: [1. 0.9375 0.96875 0.93939394 0.93333333 1.
1. 0.96428571 0.96774194 0.93548387]
mean value: 0.96464887934646
key: train_precision
value: [0.98353909 0.97508897 0.92281879 0.97491039 0.98406375 0.96099291
0.97101449 0.98113208 0.95155709 0.98916968]
mean value: 0.9694287238395796
key: test_recall
value: [0.83870968 0.96774194 1. 1. 0.90322581 1.
0.93548387 0.87096774 0.96774194 0.96666667]
mean value: 0.9450537634408602
key: train_recall
value: [0.85971223 0.98561151 0.98920863 0.97841727 0.88848921 0.97482014
0.96402878 0.9352518 0.98920863 0.98207885]
mean value: 0.9546827054485444
key: test_roc_auc
value: [0.91935484 0.9516129 0.98387097 0.96774194 0.91935484 1.
0.96774194 0.91935484 0.9672043 0.95107527]
mean value: 0.954731182795699
key: train_roc_auc
value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259
0.9676259 0.95863309 0.96951471 0.98564374]
mean value: 0.9618827518630257
key: test_jcc
value: [0.83870968 0.90909091 0.96875 0.93939394 0.84848485 1.
0.93548387 0.84375 0.9375 0.90625 ]
mean value: 0.9127413245356794
key: train_jcc
value: [0.84751773 0.96140351 0.91362126 0.95438596 0.87588652 0.93771626
0.93706294 0.91872792 0.94178082 0.97163121]
mean value: 0.925973413428646
MCC on Blind test: 0.13
Accuracy on Blind test: 0.39
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01376939 0.01323795 0.01437616 0.01475048 0.01206875 0.01291966
0.01519465 0.0118556 0.01220989 0.01324606]
mean value: 0.013362860679626465
key: score_time
value: [0.01042581 0.01045299 0.01043582 0.01047158 0.01042938 0.01043868
0.0104847 0.01037741 0.0103879 0.01042461]
mean value: 0.010432887077331542
key: test_mcc
value: [0.90748521 0.87278605 0.93548387 0.87096774 0.90369611 0.90369611
0.84983659 0.84983659 0.84710837 0.83638369]
mean value: 0.8777280337009071
key: train_mcc
value: [0.91267965 0.91482985 0.90302377 0.91827075 0.91106862 0.91267965
0.89008997 0.90161686 0.7528037 0.94982722]
mean value: 0.8966890034959883
key: test_accuracy
value: [0.9516129 0.93548387 0.96774194 0.93548387 0.9516129 0.9516129
0.91935484 0.91935484 0.91803279 0.91803279]
mean value: 0.9368323638286621
key: train_accuracy
value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597
0.94244604 0.94964029 0.86355476 0.97486535]
mean value: 0.9462520827144388
key: test_fscore
value: [0.95384615 0.9375 0.96774194 0.93548387 0.95238095 0.95238095
0.9122807 0.9122807 0.92537313 0.91525424]
mean value: 0.9364522640184937
key: train_fscore
value: [0.95667244 0.95789474 0.95099819 0.95764273 0.95395948 0.95667244
0.9391635 0.94776119 0.87898089 0.97508897]
mean value: 0.9474834571073163
key: test_precision
value: [0.91176471 0.90909091 0.96774194 0.93548387 0.9375 0.9375
1. 1. 0.86111111 0.93103448]
mean value: 0.9391227015294606
key: train_precision
value: [0.92307692 0.93493151 0.95970696 0.98113208 0.97735849 0.92307692
0.99596774 0.98449612 0.78857143 0.96819788]
mean value: 0.9436516053144435
key: test_recall
value: [1. 0.96774194 0.96774194 0.93548387 0.96774194 0.96774194
0.83870968 0.83870968 1. 0.9 ]
mean value: 0.9383870967741935
key: train_recall
value: [0.99280576 0.98201439 0.94244604 0.9352518 0.93165468 0.99280576
0.88848921 0.91366906 0.99280576 0.98207885]
mean value: 0.9554021299089761
key: test_roc_auc
value: [0.9516129 0.93548387 0.96774194 0.93548387 0.9516129 0.9516129
0.91935484 0.91935484 0.91666667 0.91774194]
mean value: 0.9366666666666668
key: train_roc_auc
value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597
0.94244604 0.94964029 0.86378639 0.97485238]
mean value: 0.946273948583069
key: test_jcc
value: [0.91176471 0.88235294 0.9375 0.87878788 0.90909091 0.90909091
0.83870968 0.83870968 0.86111111 0.84375 ]
mean value: 0.8810867809978341
key: train_jcc
value: [0.91694352 0.91919192 0.90657439 0.91872792 0.91197183 0.91694352
0.88530466 0.90070922 0.78409091 0.95138889]
mean value: 0.9011846780361379
MCC on Blind test: 0.13
Accuracy on Blind test: 0.33
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10950232 0.09739041 0.09704351 0.09424162 0.09430504 0.09652781
0.09616399 0.10172677 0.10454583 0.09388137]
mean value: 0.09853286743164062
key: score_time
value: [0.01543546 0.014148 0.01423931 0.01424742 0.0141356 0.0143621
0.01467228 0.01546311 0.01425433 0.01426816]
mean value: 0.014522576332092285
key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.93743687 0.96824584
1. 1. 0.90586325 0.93649139]
mean value: 0.958825873085774
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097
1. 1. 0.95081967 0.96721311]
mean value: 0.9789000528820729
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98412698 0.96875 0.98360656
1. 1. 0.95384615 0.96774194]
mean value: 0.9793026681072028
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96774194 0.96875 0.93939394 1.
1. 1. 0.91176471 0.9375 ]
mean value: 0.9725150580760163
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 0.96774194 0.96774194 1. 1. 0.96774194
1. 1. 1. 1. ]
mean value: 0.9870967741935484
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097
1. 1. 0.95 0.96774194]
mean value: 0.9788709677419355
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96774194 0.96774194 0.9375 0.96875 0.93939394 0.96774194
1. 1. 0.91176471 0.9375 ]
mean value: 0.9598134451727905
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.21
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03574371 0.03871679 0.05276203 0.05378747 0.05436444 0.04580855
0.04448795 0.03396726 0.04315591 0.03275442]
mean value: 0.04355485439300537
key: score_time
value: [0.02251959 0.03061891 0.03506684 0.03417039 0.03329325 0.02393937
0.03167629 0.02193832 0.03418398 0.01938605]
mean value: 0.028679299354553222
key: test_mcc
value: [1. 0.87096774 1. 0.96824584 0.90369611 0.90748521
0.93743687 0.96824584 1. 0.8688172 ]
mean value: 0.9424894812989454
key: train_mcc
value: [0.99640932 0.99640932 0.99280576 0.99283145 0.98561151 0.99280576
0.99640932 0.99640932 0.99641572 0.99641577]
mean value: 0.9942523261997296
key: test_accuracy
value: [1. 0.93548387 1. 0.98387097 0.9516129 0.9516129
0.96774194 0.98387097 1. 0.93442623]
mean value: 0.9708619777895293
key: train_accuracy
value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288
0.99820144 0.99820144 0.99820467 0.99820467]
mean value: 0.9971229479612002
key: test_fscore
value: [1. 0.93548387 1. 0.98412698 0.95238095 0.94915254
0.96666667 0.98360656 1. 0.93333333]
mean value: 0.9704750907225609
key: train_fscore
value: [0.9981982 0.9981982 0.99640288 0.99638989 0.99280576 0.99640288
0.99820467 0.9981982 0.9981982 0.99820467]
mean value: 0.997120353100802
key: test_precision
value: [1. 0.93548387 1. 0.96875 0.9375 1.
1. 1. 1. 0.93333333]
mean value: 0.9775067204301076
key: train_precision
value: [1. 1. 0.99640288 1. 0.99280576 0.99640288
0.99641577 1. 1. 1. ]
mean value: 0.9982027281400686
key: test_recall
value: [1. 0.93548387 1. 1. 0.96774194 0.90322581
0.93548387 0.96774194 1. 0.93333333]
mean value: 0.9643010752688173
key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99280576 0.99280576 0.99640288
1. 0.99640288 0.99640288 0.99641577]
mean value: 0.9960444547587737
key: test_roc_auc
value: [1. 0.93548387 1. 0.98387097 0.9516129 0.9516129
0.96774194 0.98387097 1. 0.9344086 ]
mean value: 0.9708602150537635
key: train_roc_auc
value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288
0.99820144 0.99820144 0.99820144 0.99820789]
mean value: 0.9971229468038473
key: test_jcc
value: [1. 0.87878788 1. 0.96875 0.90909091 0.90322581
0.93548387 0.96774194 1. 0.875 ]
mean value: 0.9438080400782014
key: train_jcc
value: [0.99640288 0.99640288 0.99283154 0.99280576 0.98571429 0.99283154
0.99641577 0.99640288 0.99640288 0.99641577]
mean value: 0.994262617555725
MCC on Blind test: 0.06
Accuracy on Blind test: 0.21
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.18538761 0.21855521 0.19818258 0.16929436 0.21186471 0.22627425
0.19866776 0.15631485 0.20869946 0.18409514]
mean value: 0.19573359489440917
key: score_time
value: [0.02069068 0.01292896 0.04060698 0.02179265 0.02101707 0.0241735
0.01276922 0.02045441 0.02064705 0.03370333]
mean value: 0.022878384590148924
key: test_mcc
value: [0.90748521 0.62471615 0.77459667 0.83914639 0.7190925 0.80813523
0.64820372 0.83914639 0.63939757 0.77096774]
mean value: 0.7570887579844512
key: train_mcc
value: [0.88143754 0.84999939 0.88509826 0.86366703 0.87437795 0.88157448
0.87826623 0.87806148 0.8713058 0.88511972]
mean value: 0.8748907880078497
key: test_accuracy
value: [0.9516129 0.80645161 0.88709677 0.91935484 0.85483871 0.90322581
0.82258065 0.91935484 0.81967213 0.8852459 ]
mean value: 0.8769434161819143
key: train_accuracy
value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748
0.93884892 0.93884892 0.93536804 0.94254937]
mean value: 0.9372521731268486
key: test_fscore
value: [0.94915254 0.82352941 0.88888889 0.92063492 0.86567164 0.90625
0.83076923 0.91803279 0.82539683 0.8852459 ]
mean value: 0.8813572150143087
key: train_fscore
value: [0.94117647 0.92631579 0.9430605 0.93262411 0.93783304 0.94138544
0.93992933 0.93971631 0.93639576 0.94285714]
mean value: 0.9381293887479757
key: test_precision
value: [1. 0.75675676 0.875 0.90625 0.80555556 0.87878788
0.79411765 0.93333333 0.8125 0.87096774]
mean value: 0.8633268913427832
key: train_precision
value: [0.93286219 0.90410959 0.93309859 0.91958042 0.92631579 0.92982456
0.92361111 0.92657343 0.92013889 0.93950178]
mean value: 0.9255616347793583
key: test_recall
value: [0.90322581 0.90322581 0.90322581 0.93548387 0.93548387 0.93548387
0.87096774 0.90322581 0.83870968 0.9 ]
mean value: 0.9029032258064515
key: train_recall
value: [0.94964029 0.94964029 0.95323741 0.94604317 0.94964029 0.95323741
0.95683453 0.95323741 0.95323741 0.94623656]
mean value: 0.9510984760578634
key: test_roc_auc
value: [0.9516129 0.80645161 0.88709677 0.91935484 0.85483871 0.90322581
0.82258065 0.91935484 0.81935484 0.88548387]
mean value: 0.8769354838709678
key: train_roc_auc
value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748
0.93884892 0.93884892 0.93540007 0.94254274]
mean value: 0.9372547123591449
key: test_jcc
value: [0.90322581 0.7 0.8 0.85294118 0.76315789 0.82857143
0.71052632 0.84848485 0.7027027 0.79411765]
mean value: 0.790372782026632
key: train_jcc
value: [0.88888889 0.8627451 0.89225589 0.87375415 0.88294314 0.88926174
0.88666667 0.88628763 0.88039867 0.89189189]
mean value: 0.8835093775860033
MCC on Blind test: 0.22
Accuracy on Blind test: 0.49
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.24915671 0.24478769 0.25460291 0.25357485 0.25452995 0.24537802
0.2443285 0.24822903 0.24816132 0.25430918]
mean value: 0.24970581531524658
key: score_time
value: [0.00863647 0.0090971 0.00848818 0.00925422 0.00943804 0.00865912
0.00864434 0.00895667 0.00889111 0.00870037]
mean value: 0.008876562118530273
key: test_mcc
value: [1. 0.87096774 1. 0.96824584 0.93743687 0.90748521
0.96824584 0.96824584 1. 0.8688172 ]
mean value: 0.9489444535426244
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.93548387 1. 0.98387097 0.96774194 0.9516129
0.98387097 0.98387097 1. 0.93442623]
mean value: 0.9740877842411423
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93548387 1. 0.98412698 0.96875 0.94915254
0.98360656 0.98360656 1. 0.93333333]
mean value: 0.9738059845555039
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93548387 1. 0.96875 0.93939394 1.
1. 1. 1. 0.93333333]
mean value: 0.9776961143695014
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.93548387 1. 1. 1. 0.90322581
0.96774194 0.96774194 1. 0.93333333]
mean value: 0.970752688172043
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.93548387 1. 0.98387097 0.96774194 0.9516129
0.98387097 0.98387097 1. 0.9344086 ]
mean value: 0.9740860215053764
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.87878788 1. 0.96875 0.93939394 0.90322581
0.96774194 0.96774194 1. 0.875 ]
mean value: 0.9500641495601173
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.19
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01151943 0.01374769 0.01429176 0.01392269 0.01409912 0.01630855
0.01395249 0.01366401 0.01401973 0.01421189]
mean value: 0.013973736763000488
key: score_time
value: [0.01094151 0.01091838 0.01081514 0.01111507 0.01110363 0.01155281
0.01108122 0.01173496 0.0118649 0.0110836 ]
mean value: 0.01122112274169922
key: test_mcc
value: [0.75623534 0.7130241 0.67419986 0.87831007 0.35659298 0.7284928
0.61807005 0.87278605 0.70874158 0.47128445]
mean value: 0.6777737268610616
key: train_mcc
value: [0.7898587 0.84192273 0.79323895 0.88226013 0.52711711 0.8046478
0.84911865 0.84598626 0.839052 0.5797551 ]
mean value: 0.7752957434463477
key: test_accuracy
value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871
0.80645161 0.93548387 0.85245902 0.72131148]
mean value: 0.8299576943416181
key: train_accuracy
value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201
0.92446043 0.92266187 0.91741472 0.76481149]
mean value: 0.8784744197460703
key: test_fscore
value: [0.88235294 0.86153846 0.79245283 0.93939394 0.5 0.86956522
0.79310345 0.9375 0.84745763 0.65306122]
mean value: 0.8076425689573157
key: train_fscore
value: [0.89768977 0.92 0.876 0.94200351 0.6119403 0.9048414
0.92363636 0.92416226 0.91287879 0.70561798]
mean value: 0.861877037129891
key: test_precision
value: [0.81081081 0.82352941 0.95454545 0.88571429 0.84615385 0.78947368
0.85185185 0.90909091 0.89285714 0.84210526]
mean value: 0.8606132660157428
key: train_precision
value: [0.82926829 0.93014706 0.98648649 0.9209622 0.99193548 0.84423676
0.93382353 0.90657439 0.964 0.94578313]
mean value: 0.9253217337706788
key: test_recall
value: [0.96774194 0.90322581 0.67741935 1. 0.35483871 0.96774194
0.74193548 0.96774194 0.80645161 0.53333333]
mean value: 0.7920430107526881
key: train_recall
value: [0.97841727 0.91007194 0.78776978 0.96402878 0.44244604 0.97482014
0.91366906 0.94244604 0.86690647 0.56272401]
mean value: 0.8343299553905263
key: test_roc_auc
value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871
0.80645161 0.93548387 0.85322581 0.71827957]
mean value: 0.8297311827956989
key: train_roc_auc
value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201
0.92446043 0.92266187 0.91732421 0.76517496]
mean value: 0.8785017147572265
key: test_jcc
value: [0.78947368 0.75675676 0.65625 0.88571429 0.33333333 0.76923077
0.65714286 0.88235294 0.73529412 0.48484848]
mean value: 0.6950397230060543
key: train_jcc
value: [0.81437126 0.85185185 0.77935943 0.89036545 0.44086022 0.82621951
0.85810811 0.85901639 0.83972125 0.54513889]
mean value: 0.7705012360490753
MCC on Blind test: 0.12
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03200245 0.03018045 0.02014041 0.03684139 0.03164029 0.0324831
0.03526735 0.02932453 0.02801824 0.02917194]
mean value: 0.0305070161819458
key: score_time
value: [0.0254848 0.03329682 0.01865816 0.02000928 0.02121401 0.01249504
0.01821375 0.02222514 0.02327013 0.02214599]
mean value: 0.02170131206512451
key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.93743687 0.90748521 0.93548387
0.90369611 0.90369611 0.90215054 0.80322581]
mean value: 0.9007781314102745
key: train_mcc
value: [0.94283651 0.93585746 0.92124484 0.9354697 0.93563929 0.92494527
0.91054923 0.93563929 0.92138939 0.92878086]
mean value: 0.929235183258956
key: test_accuracy
value: [0.98387097 0.91935484 0.9516129 0.96774194 0.9516129 0.96774194
0.9516129 0.9516129 0.95081967 0.90163934]
mean value: 0.9497620306716024
key: train_accuracy
value: [0.97122302 0.9676259 0.96043165 0.9676259 0.9676259 0.96223022
0.95503597 0.9676259 0.96050269 0.96409336]
mean value: 0.9644020510700955
key: test_fscore
value: [0.98412698 0.92307692 0.95081967 0.96875 0.95384615 0.96774194
0.95081967 0.95081967 0.95081967 0.9 ]
mean value: 0.9500820685058522
key: train_fscore
value: [0.97163121 0.96819788 0.96099291 0.96797153 0.96808511 0.96283186
0.95575221 0.96808511 0.96099291 0.96478873]
mean value: 0.9649329447341147
key: test_precision
value: [0.96875 0.88235294 0.96666667 0.93939394 0.91176471 0.96774194
0.96666667 0.96666667 0.96666667 0.9 ]
mean value: 0.9436670188603301
key: train_precision
value: [0.95804196 0.95138889 0.94755245 0.95774648 0.95454545 0.94773519
0.94076655 0.95454545 0.94755245 0.94809689]
mean value: 0.9507971757973318
key: test_recall
value: [1. 0.96774194 0.93548387 1. 1. 0.96774194
0.93548387 0.93548387 0.93548387 0.9 ]
mean value: 0.957741935483871
key: train_recall
value: [0.98561151 0.98561151 0.97482014 0.97841727 0.98201439 0.97841727
0.97122302 0.98201439 0.97482014 0.98207885]
mean value: 0.9795028493334365
key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129 0.96774194 0.9516129 0.96774194
0.9516129 0.9516129 0.95107527 0.9016129 ]
mean value: 0.9497849462365592
key: train_roc_auc
value: [0.97122302 0.9676259 0.96043165 0.9676259 0.9676259 0.96223022
0.95503597 0.9676259 0.96052835 0.96406101]
mean value: 0.9644013821201104
key: test_jcc
value: [0.96875 0.85714286 0.90625 0.93939394 0.91176471 0.9375
0.90625 0.90625 0.90625 0.81818182]
mean value: 0.9057733320600968
key: train_jcc
value: [0.94482759 0.93835616 0.92491468 0.93793103 0.93814433 0.92832765
0.91525424 0.93814433 0.92491468 0.93197279]
mean value: 0.9322787467857844
MCC on Blind test: 0.19
Accuracy on Blind test: 0.44
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25432348 0.2824018 0.20859909 0.19930434 0.2196362 0.19928908
0.19986963 0.19727373 0.24380755 0.21062398]
mean value: 0.22151288986206055
key: score_time
value: [0.02141023 0.0218761 0.02016091 0.01457238 0.01933861 0.01388955
0.01085019 0.01082468 0.0215745 0.02148724]
mean value: 0.017598438262939452
key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.93743687 0.87278605 0.96824584
0.93548387 0.90369611 0.9344086 0.83638369]
mean value: 0.9103047822115174
key: train_mcc
value: [0.94283651 0.94283651 0.93563929 0.9354697 0.93900081 0.92844206
0.93238486 0.9393413 0.93207468 0.9355825 ]
mean value: 0.9363608223697346
key: test_accuracy
value: [0.98387097 0.91935484 0.9516129 0.96774194 0.93548387 0.98387097
0.96774194 0.9516129 0.96721311 0.91803279]
mean value: 0.9546536224219989
key: train_accuracy
value: [0.97122302 0.97122302 0.9676259 0.9676259 0.96942446 0.96402878
0.96582734 0.96942446 0.96588869 0.96768402]
mean value: 0.9679975588649368
key: test_fscore
value: [0.98412698 0.92307692 0.95081967 0.96875 0.9375 0.98360656
0.96774194 0.95081967 0.96774194 0.91525424]
mean value: 0.9549437917099128
key: train_fscore
value: [0.97163121 0.97163121 0.96808511 0.96797153 0.96969697 0.96453901
0.9664903 0.9699115 0.96625222 0.96808511]
mean value: 0.9684294155648834
key: test_precision
value: [0.96875 0.88235294 0.96666667 0.93939394 0.90909091 1.
0.96774194 0.96666667 0.96774194 0.93103448]
mean value: 0.9499439476721016
key: train_precision
value: [0.95804196 0.95804196 0.95454545 0.95774648 0.96113074 0.95104895
0.94809689 0.95470383 0.95438596 0.95789474]
mean value: 0.9555636962921179
key: test_recall
value: [1. 0.96774194 0.93548387 1. 0.96774194 0.96774194
0.96774194 0.93548387 0.96774194 0.9 ]
mean value: 0.9609677419354838
key: train_recall
value: [0.98561151 0.98561151 0.98201439 0.97841727 0.97841727 0.97841727
0.98561151 0.98561151 0.97841727 0.97849462]
mean value: 0.9816624120058792
key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129 0.96774194 0.93548387 0.98387097
0.96774194 0.9516129 0.9672043 0.91774194]
mean value: 0.9546236559139786
key: train_roc_auc
value: [0.97122302 0.97122302 0.9676259 0.9676259 0.96942446 0.96402878
0.96582734 0.96942446 0.96591114 0.96766458]
mean value: 0.9679978597766948
key: test_jcc
value: [0.96875 0.85714286 0.90625 0.93939394 0.88235294 0.96774194
0.9375 0.90625 0.9375 0.84375 ]
mean value: 0.9146631673197139
key: train_jcc
value: [0.94482759 0.94482759 0.93814433 0.93793103 0.94117647 0.93150685
0.93515358 0.94158076 0.9347079 0.93814433]
mean value: 0.9388000430005232
MCC on Blind test: 0.15
Accuracy on Blind test: 0.38
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02441096 0.02007389 0.02128315 0.01857615 0.01912689 0.01964045
0.02089906 0.01830864 0.02194476 0.0219276 ]
mean value: 0.02061915397644043
key: score_time
value: [0.01061249 0.01058674 0.01089358 0.01047206 0.01052213 0.01050425
0.01066399 0.01052094 0.01055193 0.01059175]
mean value: 0.010591983795166016
key: test_mcc
value: [0.56360186 0.56360186 0.75 0.68884672 0.8819171 0.82717019
0.9375 0.87083333 0.80753845 0.82078268]
mean value: 0.7711792204154371
key: train_mcc
value: [0.83904826 0.83305418 0.804094 0.83230783 0.81084496 0.79737782
0.84634011 0.79137125 0.81153605 0.79748625]
mean value: 0.8163460715624157
key: test_accuracy
value: [0.78125 0.78125 0.875 0.84375 0.9375 0.90625
0.96774194 0.93548387 0.90322581 0.90322581]
mean value: 0.8834677419354838
key: train_accuracy
value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732
0.92280702 0.89473684 0.90526316 0.89824561]
mean value: 0.9075277983691623
key: test_fscore
value: [0.78787879 0.77419355 0.875 0.84848485 0.94117647 0.91428571
0.96774194 0.93333333 0.90909091 0.91428571]
mean value: 0.886547126181851
key: train_fscore
value: [0.9209622 0.91836735 0.90410959 0.91780822 0.90721649 0.90102389
0.92465753 0.89864865 0.90721649 0.90034364]
mean value: 0.9100354060453281
key: test_precision
value: [0.76470588 0.8 0.875 0.82352941 0.88888889 0.84210526
0.9375 0.93333333 0.88235294 0.84210526]
mean value: 0.858952098383213
key: train_precision
value: [0.89932886 0.88815789 0.88 0.89333333 0.88590604 0.87417219
0.90604027 0.86928105 0.88590604 0.87919463]
mean value: 0.8861320298178448
key: test_recall
value: [0.8125 0.75 0.875 0.875 1. 1.
1. 0.93333333 0.9375 1. ]
mean value: 0.9183333333333333
key: train_recall
value: [0.94366197 0.95070423 0.92957746 0.94366197 0.92957746 0.92957746
0.94405594 0.93006993 0.92957746 0.92253521]
mean value: 0.9352999113562493
key: test_roc_auc
value: [0.78125 0.78125 0.875 0.84375 0.9375 0.90625
0.96875 0.93541667 0.90208333 0.9 ]
mean value: 0.883125
key: train_roc_auc
value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732
0.9227322 0.89461243 0.90534817 0.89833054]
mean value: 0.9075248694967005
key: test_jcc
value: [0.65 0.63157895 0.77777778 0.73684211 0.88888889 0.84210526
0.9375 0.875 0.83333333 0.84210526]
mean value: 0.8015131578947369
key: train_jcc
value: [0.85350318 0.8490566 0.825 0.84810127 0.83018868 0.81987578
0.85987261 0.81595092 0.83018868 0.81875 ]
mean value: 0.8350487720908194
MCC on Blind test: 0.22
Accuracy on Blind test: 0.54
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.582335 0.77262783 0.64726782 0.610358 0.68636823 0.73911905
0.64701462 0.70605779 0.70633364 0.64915323]
mean value: 0.6746635198593139
key: score_time
value: [0.02002954 0.01183629 0.01188588 0.01181579 0.01440883 0.01112986
0.01124287 0.01208949 0.01121664 0.01208615]
mean value: 0.012774133682250976
key: test_mcc
value: [0.68884672 0.68884672 0.81409158 0.93933644 0.93933644 0.93933644
0.87866878 1. 0.87083333 0.87770745]
mean value: 0.8637003892680549
key: train_mcc
value: [1. 0.99298237 0.94375558 0.93720088 0.97192739 0.95129413
0.96512319 0.93704438 0.9720266 0.9582759 ]
mean value: 0.9629630433458233
key: test_accuracy
value: [0.84375 0.84375 0.90625 0.96875 0.96875 0.96875
0.93548387 1. 0.93548387 0.93548387]
mean value: 0.9306451612903226
key: train_accuracy
value: [1. 0.99647887 0.97183099 0.96830986 0.98591549 0.97535211
0.98245614 0.96842105 0.98596491 0.97894737]
mean value: 0.9813676797627873
key: test_fscore
value: [0.83870968 0.83870968 0.90322581 0.96774194 0.96969697 0.96774194
0.9375 1. 0.9375 0.94117647]
mean value: 0.9302002472543269
key: train_fscore
value: [1. 0.99646643 0.97202797 0.96885813 0.98601399 0.97577855
0.98269896 0.96885813 0.98601399 0.97916667]
mean value: 0.9815882813444314
key: test_precision
value: [0.86666667 0.86666667 0.93333333 1. 0.94117647 1.
0.88235294 1. 0.9375 0.88888889]
mean value: 0.9316584967320262
key: train_precision
value: [1. 1. 0.96527778 0.95238095 0.97916667 0.95918367
0.97260274 0.95890411 0.97916667 0.96575342]
mean value: 0.9732436010934054
key: test_recall
value: [0.8125 0.8125 0.875 0.9375 1. 0.9375 1. 1. 0.9375 1. ]
mean value: 0.93125
key: train_recall
value: [1. 0.99295775 0.97887324 0.98591549 0.99295775 0.99295775
0.99300699 0.97902098 0.99295775 0.99295775]
mean value: 0.9901605436816705
key: test_roc_auc
value: [0.84375 0.84375 0.90625 0.96875 0.96875 0.96875
0.9375 1. 0.93541667 0.93333333]
mean value: 0.930625
key: train_roc_auc
value: [1. 0.99647887 0.97183099 0.96830986 0.98591549 0.97535211
0.98241899 0.96838373 0.98598936 0.97899636]
mean value: 0.981367576085886
key: test_jcc
value: [0.72222222 0.72222222 0.82352941 0.9375 0.94117647 0.9375
0.88235294 1. 0.88235294 0.88888889]
mean value: 0.8737745098039216
key: train_jcc
value: [1. 0.99295775 0.94557823 0.93959732 0.97241379 0.9527027
0.96598639 0.93959732 0.97241379 0.95918367]
mean value: 0.9640430965580683
MCC on Blind test: 0.17
Accuracy on Blind test: 0.43
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00992465 0.00951338 0.00734925 0.007375 0.00736427 0.00711226
0.007195 0.00721049 0.00700927 0.00702572]
mean value: 0.007707929611206055
key: score_time
value: [0.0107615 0.00938344 0.00825024 0.0080514 0.00798678 0.00803542
0.00796485 0.00782204 0.00787354 0.00787258]
mean value: 0.008400177955627442
key: test_mcc
value: [0.625 0.62994079 0.62994079 0.68884672 0.75592895 0.75592895
0.69203857 0.6125 0.69203857 0.82078268]
mean value: 0.6902946005610465
key: train_mcc
value: [0.74714613 0.73268511 0.71170894 0.74714613 0.71859502 0.73355944
0.73273302 0.71308876 0.7285593 0.7124563 ]
mean value: 0.7277678155822408
key: test_accuracy
value: [0.8125 0.8125 0.8125 0.84375 0.875 0.875
0.83870968 0.80645161 0.83870968 0.90322581]
mean value: 0.8418346774193548
key: train_accuracy
value: [0.87323944 0.86619718 0.8556338 0.87323944 0.85915493 0.86619718
0.86315789 0.85614035 0.86315789 0.85614035]
mean value: 0.8632258463059056
key: test_fscore
value: [0.8125 0.82352941 0.82352941 0.84848485 0.88235294 0.88235294
0.84848485 0.8 0.82758621 0.91428571]
mean value: 0.8463106324034316
key: train_fscore
value: [0.87586207 0.86805556 0.85813149 0.87586207 0.86111111 0.86986301
0.87213115 0.86006826 0.86779661 0.85714286]
mean value: 0.8666024180424603
key: test_precision
value: [0.8125 0.77777778 0.77777778 0.82352941 0.83333333 0.83333333
0.77777778 0.8 0.92307692 0.84210526]
mean value: 0.8201211597999524
key: train_precision
value: [0.85810811 0.85616438 0.84353741 0.85810811 0.84931507 0.84666667
0.82098765 0.84 0.83660131 0.84827586]
mean value: 0.845776457348316
key: test_recall
value: [0.8125 0.875 0.875 0.875 0.9375 0.9375
0.93333333 0.8 0.75 1. ]
mean value: 0.8795833333333334
key: train_recall
value: [0.8943662 0.88028169 0.87323944 0.8943662 0.87323944 0.8943662
0.93006993 0.88111888 0.90140845 0.86619718]
mean value: 0.8888653599921206
key: test_roc_auc
value: [0.8125 0.8125 0.8125 0.84375 0.875 0.875
0.84166667 0.80625 0.84166667 0.9 ]
mean value: 0.8420833333333333
key: train_roc_auc
value: [0.87323944 0.86619718 0.8556338 0.87323944 0.85915493 0.86619718
0.86292229 0.8560524 0.86329164 0.85617551]
mean value: 0.8632103811681276
key: test_jcc
value: [0.68421053 0.7 0.7 0.73684211 0.78947368 0.78947368
0.73684211 0.66666667 0.70588235 0.84210526]
mean value: 0.7351496388028895
key: train_jcc
value: [0.7791411 0.76687117 0.75151515 0.7791411 0.75609756 0.76969697
0.77325581 0.75449102 0.76646707 0.75 ]
mean value: 0.7646676954206684
MCC on Blind test: 0.22
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0074749 0.00728703 0.00717139 0.00743604 0.00727081 0.00725842
0.00724769 0.00720358 0.00726557 0.00720382]
mean value: 0.0072819232940673825
key: score_time
value: [0.00798202 0.00782013 0.00794554 0.00785279 0.00791287 0.00786495
0.0078876 0.00792551 0.00783515 0.00804877]
mean value: 0.007907533645629882
key: test_mcc
value: [0.68884672 0.56360186 0.68884672 0.625 0.438357 0.68884672
0.48954403 0.48333333 0.55573827 0.55573827]
mean value: 0.5777852941864914
key: train_mcc
value: [0.64814452 0.64814452 0.6479516 0.63405443 0.65572679 0.62714946
0.62393794 0.65616074 0.64212548 0.6494089 ]
mean value: 0.6432804381067745
key: test_accuracy
value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375
0.74193548 0.74193548 0.77419355 0.77419355]
mean value: 0.7876008064516129
key: train_accuracy
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
0.81052632 0.82807018 0.82105263 0.8245614 ]
mean value: 0.8213787991104522
key: test_fscore
value: [0.83870968 0.77419355 0.84848485 0.8125 0.70967742 0.84848485
0.75 0.73333333 0.8 0.8 ]
mean value: 0.791538367546432
key: train_fscore
value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609
0.82 0.82807018 0.82105263 0.82638889]
mean value: 0.8238653070404355
key: test_precision
value: [0.86666667 0.8 0.82352941 0.8125 0.73333333 0.82352941
0.70588235 0.73333333 0.73684211 0.73684211]
mean value: 0.7772458720330238
key: train_precision
value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109
0.78343949 0.83098592 0.81818182 0.81506849]
mean value: 0.8129404935574437
key: test_recall
value: [0.8125 0.75 0.875 0.8125 0.6875 0.875
0.8 0.73333333 0.875 0.875 ]
mean value: 0.8095833333333333
key: train_recall
value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592
0.86013986 0.82517483 0.82394366 0.83802817]
mean value: 0.8354328769821727
key: test_roc_auc
value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375
0.74375 0.74166667 0.77083333 0.77083333]
mean value: 0.7870833333333334
key: train_roc_auc
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
0.81035162 0.82808037 0.82106274 0.82460849]
mean value: 0.8213680685511672
key: test_jcc
value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55 0.73684211
0.6 0.57894737 0.66666667 0.66666667]
mean value: 0.6573976608187134
key: train_jcc
value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848
0.69491525 0.70658683 0.69642857 0.70414201]
mean value: 0.7005092700712355
MCC on Blind test: 0.19
Accuracy on Blind test: 0.54
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00720549 0.00690293 0.00749421 0.0067699 0.00747395 0.00748181
0.00723648 0.00759244 0.0074594 0.00673008]
mean value: 0.0072346687316894535
key: score_time
value: [0.01040697 0.01126409 0.01092076 0.01008987 0.01062059 0.01053739
0.01394534 0.01144624 0.01064205 0.01186824]
mean value: 0.011174154281616212
key: test_mcc
value: [0.62994079 0.31311215 0.56360186 0.56360186 0.31814238 0.82717019
0.82285074 0.67916667 0.57461167 0.68826048]
mean value: 0.5980458781591939
key: train_mcc
value: [0.71838112 0.74655293 0.7253701 0.74655293 0.71142639 0.68311553
0.69826652 0.67718901 0.70556653 0.67774254]
mean value: 0.7090163590202463
key: test_accuracy
value: [0.8125 0.65625 0.78125 0.78125 0.65625 0.90625
0.90322581 0.83870968 0.77419355 0.83870968]
mean value: 0.794858870967742
key: train_accuracy
value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338 0.8415493
0.84912281 0.83859649 0.85263158 0.83859649]
mean value: 0.8544440326167532
key: test_fscore
value: [0.8 0.66666667 0.78787879 0.77419355 0.62068966 0.91428571
0.90909091 0.83870968 0.81081081 0.85714286]
mean value: 0.7979468626854611
key: train_fscore
value: [0.85815603 0.87412587 0.86315789 0.87412587 0.85714286 0.8409894
0.84912281 0.83916084 0.85416667 0.83453237]
mean value: 0.8544680614739297
key: test_precision
value: [0.85714286 0.64705882 0.76470588 0.8 0.69230769 0.84210526
0.83333333 0.8125 0.71428571 0.78947368]
mean value: 0.7752913250320371
key: train_precision
value: [0.86428571 0.86805556 0.86013986 0.86805556 0.84827586 0.84397163
0.85211268 0.83916084 0.84246575 0.85294118]
mean value: 0.8539464623923748
key: test_recall
value: [0.75 0.6875 0.8125 0.75 0.5625 1.
1. 0.86666667 0.9375 0.9375 ]
mean value: 0.8304166666666667
key: train_recall
value: [0.85211268 0.88028169 0.86619718 0.88028169 0.86619718 0.83802817
0.84615385 0.83916084 0.86619718 0.81690141]
mean value: 0.8551511868413277
key: test_roc_auc
value: [0.8125 0.65625 0.78125 0.78125 0.65625 0.90625
0.90625 0.83958333 0.76875 0.83541667]
mean value: 0.794375
key: train_roc_auc
value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338 0.8415493
0.84913326 0.8385945 0.85267901 0.83852063]
mean value: 0.854442036836403
key: test_jcc
value: [0.66666667 0.5 0.65 0.63157895 0.45 0.84210526
0.83333333 0.72222222 0.68181818 0.75 ]
mean value: 0.672772461456672
key: train_jcc
value: [0.7515528 0.77639752 0.75925926 0.77639752 0.75 0.72560976
0.73780488 0.72289157 0.74545455 0.71604938]
mean value: 0.7461417213928212
MCC on Blind test: 0.18
Accuracy on Blind test: 0.56
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01138067 0.01109099 0.01127625 0.01128221 0.01073098 0.0110662
0.01142335 0.01140451 0.01070118 0.01133704]
mean value: 0.011169338226318359
key: score_time
value: [0.00933075 0.00924778 0.00916719 0.00923133 0.00920081 0.00921702
0.00934625 0.00917053 0.00837874 0.0092051 ]
mean value: 0.009149551391601562
key: test_mcc
value: [0.625 0.50395263 0.57265629 0.64549722 0.81409158 0.77459667
0.76948376 0.80833333 0.6310315 0.76594169]
mean value: 0.6910584675818924
key: train_mcc
value: [0.7618988 0.7476577 0.76035829 0.75897979 0.73060671 0.72554232
0.7375982 0.72956319 0.72987459 0.71397006]
mean value: 0.7396049640194965
key: test_accuracy
value: [0.8125 0.75 0.78125 0.8125 0.90625 0.875
0.87096774 0.90322581 0.80645161 0.87096774]
mean value: 0.8389112903225806
key: train_accuracy
value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493
0.86315789 0.85964912 0.85964912 0.85263158]
mean value: 0.8656918705213739
key: test_fscore
value: [0.8125 0.76470588 0.8 0.83333333 0.90909091 0.88888889
0.88235294 0.90322581 0.83333333 0.88888889]
mean value: 0.8516319983516378
key: train_fscore
value: [0.8852459 0.87868852 0.88448845 0.88372093 0.87043189 0.86842105
0.87459807 0.87096774 0.87012987 0.8627451 ]
mean value: 0.8749437532470357
key: test_precision
value: [0.8125 0.72222222 0.73684211 0.75 0.88235294 0.8
0.78947368 0.875 0.75 0.8 ]
mean value: 0.7918390952872377
key: train_precision
value: [0.82822086 0.82208589 0.83229814 0.83647799 0.82389937 0.81481481
0.80952381 0.80838323 0.80722892 0.80487805]
mean value: 0.8187811065917483
key: test_recall
value: [0.8125 0.8125 0.875 0.9375 0.9375 1.
1. 0.93333333 0.9375 1. ]
mean value: 0.9245833333333333
key: train_recall
value: [0.95070423 0.94366197 0.94366197 0.93661972 0.92253521 0.92957746
0.95104895 0.94405594 0.94366197 0.92957746]
mean value: 0.9395104895104895
key: test_roc_auc
value: [0.8125 0.75 0.78125 0.8125 0.90625 0.875
0.875 0.90416667 0.80208333 0.86666667]
mean value: 0.8385416666666666
key: train_roc_auc
value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493
0.86284842 0.85935192 0.85994287 0.85290062]
mean value: 0.865687481532552
key: test_jcc
value: [0.68421053 0.61904762 0.66666667 0.71428571 0.83333333 0.8
0.78947368 0.82352941 0.71428571 0.8 ]
mean value: 0.744483266991007
key: train_jcc
value: [0.79411765 0.78362573 0.79289941 0.79166667 0.77058824 0.76744186
0.77714286 0.77142857 0.77011494 0.75862069]
mean value: 0.7777646609518236
MCC on Blind test: 0.23
Accuracy on Blind test: 0.48
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.94244218 1.01735759 0.8850472 1.05389714 0.87144995 0.99552441
0.90028095 0.87230182 1.03594947 0.87329125]
mean value: 0.9447541952133178
key: score_time
value: [0.01177907 0.013484 0.01336789 0.01362443 0.01371074 0.01331043
0.01345372 0.01344275 0.01364231 0.01379061]
mean value: 0.013360595703125
key: test_mcc
value: [0.68884672 0.68884672 0.69991324 0.875 0.8819171 0.875
0.80833333 0.9375 0.74166667 0.82078268]
mean value: 0.8017806465017004
key: train_mcc
value: [1. 0.99298237 0.99298237 0.99298237 0.98591549 0.99298237
0.98596474 0.9789707 0.99300699 0.99300665]
mean value: 0.9908794051074042
key: test_accuracy
value: [0.84375 0.84375 0.84375 0.9375 0.9375 0.9375
0.90322581 0.96774194 0.87096774 0.90322581]
mean value: 0.8988911290322581
key: train_accuracy
value: [1. 0.99647887 0.99647887 0.99647887 0.99295775 0.99647887
0.99298246 0.98947368 0.99649123 0.99649123]
mean value: 0.9954311835927848
key: test_fscore
value: [0.84848485 0.83870968 0.85714286 0.9375 0.94117647 0.9375
0.90322581 0.96774194 0.875 0.91428571]
mean value: 0.9020767309856494
key: train_fscore
value: [1. 0.99646643 0.99646643 0.99646643 0.99295775 0.99646643
0.99300699 0.98954704 0.99649123 0.99646643]
mean value: 0.99543351613606
key: test_precision
value: [0.82352941 0.86666667 0.78947368 0.9375 0.88888889 0.9375
0.875 0.9375 0.875 0.84210526]
mean value: 0.8773163914688682
key: train_precision
value: [1. 1. 1. 1. 0.99295775 1.
0.99300699 0.98611111 0.99300699 1. ]
mean value: 0.9965082843603971
key: test_recall
value: [0.875 0.8125 0.9375 0.9375 1. 0.9375
0.93333333 1. 0.875 1. ]
mean value: 0.9308333333333333
key: train_recall
value: [1. 0.99295775 0.99295775 0.99295775 0.99295775 0.99295775
0.99300699 0.99300699 1. 0.99295775]
mean value: 0.9943760464887226
key: test_roc_auc
value: [0.84375 0.84375 0.84375 0.9375 0.9375 0.9375
0.90416667 0.96875 0.87083333 0.9 ]
mean value: 0.89875
key: train_roc_auc
value: [1. 0.99647887 0.99647887 0.99647887 0.99295775 0.99647887
0.99298237 0.98946124 0.9965035 0.99647887]
mean value: 0.9954299221904855
key: test_jcc
value: [0.73684211 0.72222222 0.75 0.88235294 0.88888889 0.88235294
0.82352941 0.9375 0.77777778 0.84210526]
mean value: 0.8243571551427589
key: train_jcc
value: [1. 0.99295775 0.99295775 0.99295775 0.98601399 0.99295775
0.98611111 0.97931034 0.99300699 0.99295775]
mean value: 0.9909231167354042
MCC on Blind test: 0.18
Accuracy on Blind test: 0.45
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01128316 0.01111746 0.00982285 0.009413 0.00901008 0.00897694
0.00881934 0.00902772 0.00959873 0.00934815]
mean value: 0.009641742706298828
key: score_time
value: [0.01056886 0.00906277 0.00891089 0.00860476 0.0085628 0.00862837
0.00838184 0.00824451 0.00853562 0.00857925]
mean value: 0.008807969093322755
key: test_mcc
value: [0.81409158 0.68884672 0.875 1. 0.8819171 0.93933644
0.9375 1. 0.80833333 0.80753845]
mean value: 0.8752563621702886
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.84375 0.9375 1. 0.9375 0.96875
0.96774194 1. 0.90322581 0.90322581]
mean value: 0.9367943548387097
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.83870968 0.9375 1. 0.94117647 0.96774194
0.96774194 1. 0.90322581 0.90909091]
mean value: 0.9374277643608763
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.86666667 0.9375 1. 0.88888889 1.
0.9375 1. 0.93333333 0.88235294]
mean value: 0.932859477124183
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.8125 0.9375 1. 1. 0.9375 1. 1. 0.875 0.9375]
mean value: 0.94375
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.84375 0.9375 1. 0.9375 0.96875
0.96875 1. 0.90416667 0.90208333]
mean value: 0.936875
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.72222222 0.88235294 1. 0.88888889 0.9375
0.9375 1. 0.82352941 0.83333333]
mean value: 0.8858660130718954
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.02
Accuracy on Blind test: 0.22
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09702897 0.09689069 0.096277 0.09499073 0.09550571 0.09818435
0.09888387 0.09795642 0.09792018 0.09375334]
mean value: 0.09673912525177002
key: score_time
value: [0.01839042 0.01852298 0.0182128 0.01794076 0.01818895 0.01855779
0.01845098 0.01811409 0.01863813 0.01832128]
mean value: 0.018333816528320314
key: test_mcc
value: [0.68884672 0.68884672 0.68884672 0.62994079 0.81409158 0.93933644
0.9375 1. 0.87083333 0.87770745]
mean value: 0.8135949748773968
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84375 0.84375 0.84375 0.8125 0.90625 0.96875
0.96774194 1. 0.93548387 0.93548387]
mean value: 0.9057459677419355
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84848485 0.84848485 0.84848485 0.82352941 0.90909091 0.96969697
0.96774194 1. 0.9375 0.94117647]
mean value: 0.9094190242079236
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82352941 0.82352941 0.82352941 0.77777778 0.88235294 0.94117647
0.9375 1. 0.9375 0.88888889]
mean value: 0.883578431372549
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 0.875 0.875 0.9375 1. 1. 1. 0.9375 1. ]
mean value: 0.9375
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.84375 0.84375 0.84375 0.8125 0.90625 0.96875
0.96875 1. 0.93541667 0.93333333]
mean value: 0.905625
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.73684211 0.73684211 0.73684211 0.7 0.83333333 0.94117647
0.9375 1. 0.88235294 0.88888889]
mean value: 0.8393777949776402
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.45
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00812912 0.00792956 0.00760221 0.00785184 0.00799227 0.0079267
0.00804663 0.00806618 0.00824046 0.00799036]
mean value: 0.007977533340454101
key: score_time
value: [0.00857091 0.00843978 0.00853491 0.00855112 0.0085063 0.00849175
0.00858927 0.00863695 0.00858855 0.00860786]
mean value: 0.008551740646362304
key: test_mcc
value: [0.5 0.69991324 0.50395263 0.77459667 0.82717019 0.82717019
0.74689528 0.82078268 0.35983579 0.6125 ]
mean value: 0.6672816673588119
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.84375 0.75 0.875 0.90625 0.90625
0.87096774 0.90322581 0.67741935 0.80645161]
mean value: 0.8289314516129032
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.85714286 0.73333333 0.88888889 0.91428571 0.89655172
0.85714286 0.88888889 0.66666667 0.8125 ]
mean value: 0.8265400930487138
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.78947368 0.78571429 0.8 0.84210526 1.
0.92307692 1. 0.71428571 0.8125 ]
mean value: 0.8417155870445344
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.9375 0.6875 1. 1. 0.8125 0.8 0.8 0.625 0.8125]
mean value: 0.8225
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.84375 0.75 0.875 0.90625 0.90625
0.86875 0.9 0.67916667 0.80625 ]
mean value: 0.8285416666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.75 0.57894737 0.8 0.84210526 0.8125
0.75 0.8 0.5 0.68421053]
mean value: 0.7117763157894736
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.49
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.19962478 1.20865035 1.21551681 1.2219255 1.21645331 1.22310948
1.23031688 1.22133994 1.21362448 1.22138143]
mean value: 1.2171942949295045
key: score_time
value: [0.15371752 0.09660053 0.09662104 0.09716916 0.09705114 0.09664798
0.09718585 0.09721947 0.09733677 0.09707975]
mean value: 0.10266292095184326
key: test_mcc
value: [0.81409158 0.875 0.875 0.8819171 0.8819171 1.
0.9375 1. 1. 0.9372467 ]
mean value: 0.9202672483593498
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.9375 0.9375 0.9375 0.9375 1.
0.96774194 1. 1. 0.96774194]
mean value: 0.9591733870967742
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.9375 0.9375 0.94117647 0.94117647 1.
0.96774194 1. 1. 0.96969697]
mean value: 0.9603882755448221
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.9375 0.9375 0.88888889 0.88888889 1.
0.9375 1. 1. 0.94117647]
mean value: 0.9413807189542484
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ]
mean value: 0.98125
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.9375 0.9375 0.9375 0.9375 1.
0.96875 1. 1. 0.96666667]
mean value: 0.9591666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.88235294 0.88235294 0.88888889 0.88888889 1.
0.9375 1. 1. 0.94117647]
mean value: 0.9254493464052287
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.21
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87549925 0.98825598 0.90560436 0.87923479 0.87618375 0.90693331
0.90976977 0.86167288 0.91743851 0.89844847]
mean value: 0.9019041061401367
key: score_time
value: [0.26297545 0.16814804 0.23696327 0.21888471 0.23352385 0.23716521
0.25041318 0.24322152 0.20568323 0.21444941]
mean value: 0.2271427869796753
key: test_mcc
value: [0.68884672 0.875 0.81409158 0.8819171 0.8819171 0.93933644
0.9375 1. 1. 0.9372467 ]
mean value: 0.8955855640414887
key: train_mcc
value: [0.96500412 0.95812669 0.95091647 0.93775982 0.94403659 0.94403659
0.95108379 0.94422558 0.94423649 0.94423649]
mean value: 0.9483662624447999
key: test_accuracy
value: [0.84375 0.9375 0.90625 0.9375 0.9375 0.96875
0.96774194 1. 1. 0.96774194]
mean value: 0.9466733870967742
key: train_accuracy
value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099
0.9754386 0.97192982 0.97192982 0.97192982]
mean value: 0.9739819619471214
key: test_fscore
value: [0.84848485 0.9375 0.90909091 0.94117647 0.94117647 0.96969697
0.96774194 1. 1. 0.96969697]
mean value: 0.9484564573630039
key: train_fscore
value: [0.9825784 0.97916667 0.97560976 0.96907216 0.97222222 0.97222222
0.97577855 0.97241379 0.97222222 0.97222222]
mean value: 0.9743508213630365
key: test_precision
value: [0.82352941 0.9375 0.88235294 0.88888889 0.88888889 0.94117647
0.9375 1. 1. 0.94117647]
mean value: 0.9241013071895424
key: train_precision
value: [0.97241379 0.96575342 0.96551724 0.94630872 0.95890411 0.95890411
0.96575342 0.95918367 0.95890411 0.95890411]
mean value: 0.9610546720455594
key: test_recall
value: [0.875 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ]
mean value: 0.975
key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.99295775 0.98591549 0.98591549
0.98601399 0.98601399 0.98591549 0.98591549]
mean value: 0.9880478676253325
key: test_roc_auc
value: [0.84375 0.9375 0.90625 0.9375 0.9375 0.96875
0.96875 1. 1. 0.96666667]
mean value: 0.9466666666666667
key: train_roc_auc
value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099
0.97540136 0.97188023 0.97197873 0.97197873]
mean value: 0.9739830591943268
key: test_jcc
value: [0.73684211 0.88235294 0.83333333 0.88888889 0.88888889 0.94117647
0.9375 1. 1. 0.94117647]
mean value: 0.905015909872721
key: train_jcc
value: [0.96575342 0.95918367 0.95238095 0.94 0.94594595 0.94594595
0.9527027 0.94630872 0.94594595 0.94594595]
mean value: 0.9500113261826575
MCC on Blind test: 0.15
Accuracy on Blind test: 0.3
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01972842 0.00707984 0.00703955 0.00708413 0.00708127 0.00711823
0.0071497 0.00713921 0.00711536 0.00711989]
mean value: 0.008365559577941894
key: score_time
value: [0.00945282 0.00773811 0.00782251 0.00773239 0.00773835 0.00778341
0.00787854 0.00775337 0.00779319 0.00774002]
mean value: 0.007943272590637207
key: test_mcc
value: [0.68884672 0.56360186 0.68884672 0.625 0.438357 0.68884672
0.48954403 0.48333333 0.55573827 0.55573827]
mean value: 0.5777852941864914
key: train_mcc
value: [0.64814452 0.64814452 0.6479516 0.63405443 0.65572679 0.62714946
0.62393794 0.65616074 0.64212548 0.6494089 ]
mean value: 0.6432804381067745
key: test_accuracy
value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375
0.74193548 0.74193548 0.77419355 0.77419355]
mean value: 0.7876008064516129
key: train_accuracy
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
0.81052632 0.82807018 0.82105263 0.8245614 ]
mean value: 0.8213787991104522
key: test_fscore
value: [0.83870968 0.77419355 0.84848485 0.8125 0.70967742 0.84848485
0.75 0.73333333 0.8 0.8 ]
mean value: 0.791538367546432
key: train_fscore
value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609
0.82 0.82807018 0.82105263 0.82638889]
mean value: 0.8238653070404355
key: test_precision
value: [0.86666667 0.8 0.82352941 0.8125 0.73333333 0.82352941
0.70588235 0.73333333 0.73684211 0.73684211]
mean value: 0.7772458720330238
key: train_precision
value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109
0.78343949 0.83098592 0.81818182 0.81506849]
mean value: 0.8129404935574437
key: test_recall
value: [0.8125 0.75 0.875 0.8125 0.6875 0.875
0.8 0.73333333 0.875 0.875 ]
mean value: 0.8095833333333333
key: train_recall
value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592
0.86013986 0.82517483 0.82394366 0.83802817]
mean value: 0.8354328769821727
key: test_roc_auc
value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375
0.74375 0.74166667 0.77083333 0.77083333]
mean value: 0.7870833333333334
key: train_roc_auc
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
0.81035162 0.82808037 0.82106274 0.82460849]
mean value: 0.8213680685511672
key: test_jcc
value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55 0.73684211
0.6 0.57894737 0.66666667 0.66666667]
mean value: 0.6573976608187134
key: train_jcc
value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848
0.69491525 0.70658683 0.69642857 0.70414201]
mean value: 0.7005092700712355
MCC on Blind test: 0.19
Accuracy on Blind test: 0.54
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.10866404 0.04417276 0.08032727 0.0377512 0.03836942 0.03934383
0.0412488 0.73144245 0.03698397 0.03865409]
mean value: 0.11969578266143799
key: score_time
value: [0.0095489 0.00957394 0.00984144 0.00939536 0.0093596 0.00946164
0.00942516 0.00999594 0.01063395 0.00950313]
mean value: 0.00967390537261963
key: test_mcc
value: [0.81409158 0.81409158 0.875 0.93933644 0.8819171 1.
0.9375 1. 0.9375 0.87770745]
mean value: 0.9077144148609821
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.90625 0.9375 0.96875 0.9375 1.
0.96774194 1. 0.96774194 0.93548387]
mean value: 0.9527217741935484
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.90322581 0.9375 0.96969697 0.94117647 1.
0.96774194 1. 0.96774194 0.94117647]
mean value: 0.9537350497383704
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.93333333 0.9375 0.94117647 0.88888889 1.
0.9375 1. 1. 0.88888889]
mean value: 0.9409640522875817
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.875 0.9375 1. 1. 1. 1. 1. 0.9375 1. ]
mean value: 0.96875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.90625 0.9375 0.96875 0.9375 1.
0.96875 1. 0.96875 0.93333333]
mean value: 0.9527083333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.82352941 0.88235294 0.94117647 0.88888889 1.
0.9375 1. 0.9375 0.88888889]
mean value: 0.9133169934640523
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.2
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01153302 0.0145793 0.014395 0.0144124 0.01459241 0.0143168
0.01439118 0.01458287 0.01461196 0.0144515 ]
mean value: 0.014186644554138183
key: score_time
value: [0.01013279 0.01050425 0.0104897 0.0105176 0.01051116 0.01054525
0.01043344 0.01050496 0.01060581 0.01054263]
mean value: 0.010478758811950683
key: test_mcc
value: [0.81409158 0.81409158 0.93933644 1. 0.8819171 1.
0.87083333 1. 1. 0.9372467 ]
mean value: 0.9257516728277053
key: train_mcc
value: [0.95812669 0.95812669 0.94403659 0.93720088 0.94403659 0.93720088
0.95108379 0.95145657 0.94470481 0.9582759 ]
mean value: 0.948424939171215
key: test_accuracy
value: [0.90625 0.90625 0.96875 1. 0.9375 1.
0.93548387 1. 1. 0.96774194]
mean value: 0.9621975806451613
key: train_accuracy
value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986
0.9754386 0.9754386 0.97192982 0.97894737]
mean value: 0.9739782554978997
key: test_fscore
value: [0.90909091 0.90322581 0.96969697 1. 0.94117647 1.
0.93333333 1. 1. 0.96969697]
mean value: 0.962622045885803
key: train_fscore
value: [0.97916667 0.97916667 0.97222222 0.96885813 0.97222222 0.96885813
0.97577855 0.97594502 0.97241379 0.97916667]
mean value: 0.9743798064418605
key: test_precision
value: [0.88235294 0.93333333 0.94117647 1. 0.88888889 1.
0.93333333 1. 1. 0.94117647]
mean value: 0.9520261437908497
key: train_precision
value: [0.96575342 0.96575342 0.95890411 0.95238095 0.95890411 0.95238095
0.96575342 0.95945946 0.9527027 0.96575342]
mean value: 0.9597745984732285
key: test_recall
value: [0.9375 0.875 1. 1. 1. 1.
0.93333333 1. 1. 1. ]
mean value: 0.9745833333333334
key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549
0.98601399 0.99300699 0.99295775 0.99295775]
mean value: 0.9894513936767458
key: test_roc_auc
value: [0.90625 0.90625 0.96875 1. 0.9375 1.
0.93541667 1. 1. 0.96666667]
mean value: 0.9620833333333333
key: train_roc_auc
value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986
0.97540136 0.97537674 0.97200335 0.97899636]
mean value: 0.9739805968679208
key: test_jcc
value: [0.83333333 0.82352941 0.94117647 1. 0.88888889 1.
0.875 1. 1. 0.94117647]
mean value: 0.9303104575163399
key: train_jcc
value: [0.95918367 0.95918367 0.94594595 0.93959732 0.94594595 0.93959732
0.9527027 0.95302013 0.94630872 0.95918367]
mean value: 0.9500669104935644
MCC on Blind test: 0.16
Accuracy on Blind test: 0.36
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00940704 0.00748181 0.00721669 0.00730562 0.00781703 0.00796103
0.0077672 0.00788808 0.00784159 0.00789857]
mean value: 0.007858467102050782
key: score_time
value: [0.00908256 0.00800729 0.00791621 0.00769114 0.00820541 0.00846505
0.00846457 0.00859737 0.00852084 0.00845647]
mean value: 0.008340692520141602
key: test_mcc
value: [0.62994079 0.50395263 0.62994079 0.68884672 0.62994079 0.75592895
0.67916667 0.61925228 0.74689528 0.66057826]
mean value: 0.6544443153383147
key: train_mcc
value: [0.67386056 0.69575325 0.68038921 0.67508446 0.67277821 0.66621443
0.66189073 0.68037155 0.67635913 0.66649204]
mean value: 0.67491935676675
key: test_accuracy
value: [0.8125 0.75 0.8125 0.84375 0.8125 0.875
0.83870968 0.80645161 0.87096774 0.80645161]
mean value: 0.822883064516129
key: train_accuracy
value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592
0.82807018 0.83859649 0.83508772 0.83157895]
mean value: 0.835093896713615
key: test_fscore
value: [0.8 0.76470588 0.82352941 0.84848485 0.82352941 0.88235294
0.83870968 0.8125 0.88235294 0.84210526]
mean value: 0.8318270377297392
key: train_fscore
value: [0.84385382 0.85430464 0.84666667 0.84488449 0.84280936 0.84
0.83934426 0.84666667 0.84488449 0.83892617]
mean value: 0.844234056793084
key: test_precision
value: [0.85714286 0.72222222 0.77777778 0.82352941 0.77777778 0.83333333
0.8125 0.76470588 0.83333333 0.72727273]
mean value: 0.7929595322977676
key: train_precision
value: [0.79874214 0.80625 0.80379747 0.79503106 0.80254777 0.79746835
0.79012346 0.8089172 0.79503106 0.80128205]
mean value: 0.7999190549175873
key: test_recall
value: [0.75 0.8125 0.875 0.875 0.875 0.9375
0.86666667 0.86666667 0.9375 1. ]
mean value: 0.8795833333333334
key: train_recall
value: [0.8943662 0.9084507 0.8943662 0.90140845 0.88732394 0.88732394
0.8951049 0.88811189 0.90140845 0.88028169]
mean value: 0.8938146360681573
key: test_roc_auc
value: [0.8125 0.75 0.8125 0.84375 0.8125 0.875
0.83958333 0.80833333 0.86875 0.8 ]
mean value: 0.8222916666666666
key: train_roc_auc
value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592
0.82783414 0.83842214 0.83531961 0.83174924]
mean value: 0.8350930759381464
key: test_jcc
value: [0.66666667 0.61904762 0.7 0.73684211 0.7 0.78947368
0.72222222 0.68421053 0.78947368 0.72727273]
mean value: 0.7135209235209236
key: train_jcc
value: [0.72988506 0.74566474 0.73410405 0.73142857 0.7283237 0.72413793
0.72316384 0.73410405 0.73142857 0.72254335]
mean value: 0.7304783857563864
MCC on Blind test: 0.22
Accuracy on Blind test: 0.54
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00985122 0.01013541 0.01139951 0.01236129 0.01133943 0.01101375
0.01197648 0.01175308 0.01180387 0.0115931 ]
mean value: 0.011322712898254395
key: score_time
value: [0.00835967 0.01045895 0.01057839 0.01042914 0.01039219 0.01044703
0.01043487 0.01063824 0.01045942 0.01041865]
mean value: 0.0102616548538208
key: test_mcc
value: [0.75592895 0.68884672 0.8819171 0.67419986 0.8819171 0.81409158
0.87866878 0.9375 0.87083333 0.9372467 ]
mean value: 0.8321150124795701
key: train_mcc
value: [0.97183099 0.92966968 0.93775982 0.8661418 0.92365817 0.90901439
0.95798651 0.9114673 0.78397114 0.94395469]
mean value: 0.9135454491091561
key: test_accuracy
value: [0.875 0.84375 0.9375 0.8125 0.9375 0.90625
0.93548387 0.96774194 0.93548387 0.96774194]
mean value: 0.9118951612903226
key: train_accuracy
value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535
0.97894737 0.95438596 0.89122807 0.97192982]
mean value: 0.9560575735112429
key: test_fscore
value: [0.88235294 0.83870968 0.94117647 0.76923077 0.94117647 0.90909091
0.9375 0.96774194 0.9375 0.96969697]
mean value: 0.9094176143274815
key: train_fscore
value: [0.98591549 0.96503497 0.96907216 0.92481203 0.96219931 0.9550173
0.97916667 0.95622896 0.88727273 0.97202797]
mean value: 0.9556747588965514
key: test_precision
value: [0.83333333 0.86666667 0.88888889 1. 0.88888889 0.88235294
0.88235294 0.9375 0.9375 0.94117647]
mean value: 0.9058660130718954
key: train_precision
value: [0.98591549 0.95833333 0.94630872 0.99193548 0.93959732 0.93877551
0.97241379 0.92207792 0.91729323 0.96527778]
mean value: 0.9537928586676441
key: test_recall
value: [0.9375 0.8125 1. 0.625 1. 0.9375 1. 1. 0.9375 1. ]
mean value: 0.925
key: train_recall
value: [0.98591549 0.97183099 0.99295775 0.86619718 0.98591549 0.97183099
0.98601399 0.99300699 0.85915493 0.97887324]
mean value: 0.9591697035359007
key: test_roc_auc
value: [0.875 0.84375 0.9375 0.8125 0.9375 0.90625
0.9375 0.96875 0.93541667 0.96666667]
mean value: 0.9120833333333334
key: train_roc_auc
value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535
0.97892249 0.95424998 0.89111593 0.9719541 ]
mean value: 0.9560326996946715
key: test_jcc
value: [0.78947368 0.72222222 0.88888889 0.625 0.88888889 0.83333333
0.88235294 0.9375 0.88235294 0.94117647]
mean value: 0.8391189370485036
key: train_jcc
value: [0.97222222 0.93243243 0.94 0.86013986 0.92715232 0.91390728
0.95918367 0.91612903 0.79738562 0.94557823]
mean value: 0.9164130675378523
MCC on Blind test: 0.18
Accuracy on Blind test: 0.49
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01099372 0.01085377 0.01167774 0.011415 0.01157284 0.01085854
0.01104522 0.01073241 0.01113367 0.01111031]
mean value: 0.011139321327209472
key: score_time
value: [0.0103929 0.0103898 0.01037788 0.0103898 0.01040602 0.01039386
0.01040792 0.01042295 0.01051378 0.01045513]
mean value: 0.010415005683898925
key: test_mcc
value: [0.44539933 0.32025631 0.81409158 0.57735027 0.77459667 0.75592895
0.87866878 0.9375 0.87083333 0.76594169]
mean value: 0.714056690328539
key: train_mcc
value: [0.87107074 0.62077843 0.83774371 0.57207859 0.80452795 0.84114227
0.89199759 0.83981496 0.95090121 0.86664533]
mean value: 0.8096700777785382
key: test_accuracy
value: [0.71875 0.625 0.90625 0.75 0.875 0.875
0.93548387 0.96774194 0.93548387 0.87096774]
mean value: 0.8459677419354839
key: train_accuracy
value: [0.93309859 0.77816901 0.91549296 0.75 0.8943662 0.91549296
0.94385965 0.91578947 0.9754386 0.92982456]
mean value: 0.8951531999011614
key: test_fscore
value: [0.68965517 0.45454545 0.90909091 0.66666667 0.88888889 0.88235294
0.9375 0.96774194 0.9375 0.88888889]
mean value: 0.8222830857154942
key: train_fscore
value: [0.92936803 0.71493213 0.9205298 0.66976744 0.90384615 0.92156863
0.94666667 0.92156863 0.9754386 0.93377483]
mean value: 0.8837460905964674
key: test_precision
value: [0.76923077 0.83333333 0.88235294 1. 0.8 0.83333333
0.88235294 0.9375 0.9375 0.8 ]
mean value: 0.8675603318250378
key: train_precision
value: [0.98425197 1. 0.86875 0.98630137 0.82941176 0.8597561
0.9044586 0.86503067 0.97202797 0.88125 ]
mean value: 0.9151238446234521
key: test_recall
value: [0.625 0.3125 0.9375 0.5 1. 0.9375 1. 1. 0.9375 1. ]
mean value: 0.825
key: train_recall
value: [0.88028169 0.55633803 0.97887324 0.50704225 0.99295775 0.99295775
0.99300699 0.98601399 0.97887324 0.99295775]
mean value: 0.8859302669161824
key: test_roc_auc
value: [0.71875 0.625 0.90625 0.75 0.875 0.875
0.9375 0.96875 0.93541667 0.86666667]
mean value: 0.8458333333333333
key: train_roc_auc
value: [0.93309859 0.77816901 0.91549296 0.75 0.8943662 0.91549296
0.9436866 0.9155422 0.97545061 0.93004531]
mean value: 0.8951344430217669
key: test_jcc
value: [0.52631579 0.29411765 0.83333333 0.5 0.8 0.78947368
0.88235294 0.9375 0.88235294 0.8 ]
mean value: 0.7245446336429309
key: train_jcc
value: [0.86805556 0.55633803 0.85276074 0.5034965 0.8245614 0.85454545
0.89873418 0.85454545 0.95205479 0.8757764 ]
mean value: 0.8040868505268339
MCC on Blind test: 0.08
Accuracy on Blind test: 0.19
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.09240365 0.08133793 0.08125806 0.08046818 0.08067393 0.08131313
0.08116984 0.0813272 0.08116603 0.08120728]
mean value: 0.08223252296447754
key: score_time
value: [0.01535177 0.0154326 0.01515222 0.01522565 0.01519728 0.0153811
0.01536131 0.01532435 0.01529288 0.01531577]
mean value: 0.015303492546081543
key: test_mcc
value: [0.81409158 0.875 0.93933644 0.81409158 0.93933644 1.
0.9375 1. 1. 0.87770745]
mean value: 0.9197063481549348
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.9375 0.96875 0.90625 0.96875 1.
0.96774194 1. 1. 0.93548387]
mean value: 0.9590725806451613
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.9375 0.96969697 0.90322581 0.96969697 1.
0.96774194 1. 1. 0.94117647]
mean value: 0.9598129061008568
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.9375 0.94117647 0.93333333 0.94117647 1.
0.9375 1. 1. 0.88888889]
mean value: 0.9461928104575164
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.9375 1. 0.875 1. 1. 1. 1. 1. 1. ]
mean value: 0.975
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.9375 0.96875 0.90625 0.96875 1.
0.96875 1. 1. 0.93333333]
mean value: 0.9589583333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.88235294 0.94117647 0.82352941 0.94117647 1.
0.9375 1. 1. 0.88888889]
mean value: 0.9247957516339869
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.19
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03232431 0.02845907 0.02876759 0.02964163 0.04149294 0.0316689
0.04340553 0.03582311 0.04470134 0.04033637]
mean value: 0.035662078857421876
key: score_time
value: [0.01759839 0.02218199 0.01889658 0.01943088 0.02917433 0.03231716
0.03496408 0.03411865 0.02071142 0.01735568]
mean value: 0.02467491626739502
key: test_mcc
value: [0.81409158 0.81409158 0.875 0.93933644 1. 1.
0.87866878 1. 0.87866878 0.9372467 ]
mean value: 0.913710384964254
key: train_mcc
value: [0.99298237 1. 0.99298237 1. 0.99298237 0.98591549
1. 0.98596474 0.99300665 0.98596474]
mean value: 0.9929798730055359
key: test_accuracy
value: [0.90625 0.90625 0.9375 0.96875 1. 1.
0.93548387 1. 0.93548387 0.96774194]
mean value: 0.9557459677419354
key: train_accuracy
value: [0.99647887 1. 0.99647887 1. 0.99647887 0.99295775
1. 0.99298246 0.99649123 0.99298246]
mean value: 0.996485050654806
key: test_fscore
value: [0.90909091 0.90322581 0.9375 0.96774194 1. 1.
0.9375 1. 0.93333333 0.96969697]
mean value: 0.9558088954056696
key: train_fscore
value: [0.99646643 1. 0.99646643 1. 0.99646643 0.99295775
1. 0.99300699 0.99646643 0.99295775]
mean value: 0.9964788210346365
key: test_precision
value: [0.88235294 0.93333333 0.9375 1. 1. 1.
0.88235294 1. 1. 0.94117647]
mean value: 0.957671568627451
key: train_precision
value: [1. 1. 1. 1. 1. 0.99295775
1. 0.99300699 1. 0.99295775]
mean value: 0.997892248596474
key: test_recall
value: [0.9375 0.875 0.9375 0.9375 1. 1. 1. 1. 0.875 1. ]
mean value: 0.95625
key: train_recall
value: [0.99295775 1. 0.99295775 1. 0.99295775 0.99295775
1. 0.99300699 0.99295775 0.99295775]
mean value: 0.9950753471880233
key: test_roc_auc
value: [0.90625 0.90625 0.9375 0.96875 1. 1.
0.9375 1. 0.9375 0.96666667]
mean value: 0.9560416666666667
key: train_roc_auc
value: [0.99647887 1. 0.99647887 1. 0.99647887 0.99295775
1. 0.99298237 0.99647887 0.99298237]
mean value: 0.9964837978922486
key: test_jcc
value: [0.83333333 0.82352941 0.88235294 0.9375 1. 1.
0.88235294 1. 0.875 0.94117647]
mean value: 0.9175245098039215
key: train_jcc
value: [0.99295775 1. 0.99295775 1. 0.99295775 0.98601399
1. 0.98611111 0.99295775 0.98601399]
mean value: 0.9929970069054577
MCC on Blind test: 0.06
Accuracy on Blind test: 0.2
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.05620861 0.0974803 0.04944372 0.04494715 0.06924796 0.05304265
0.03282356 0.03296423 0.03626871 0.06697369]
mean value: 0.0539400577545166
key: score_time
value: [0.02177811 0.02000475 0.01141953 0.01396704 0.02782083 0.01147771
0.011482 0.01143765 0.01138997 0.02080035]
mean value: 0.01615779399871826
key: test_mcc
value: [0.62994079 0.438357 0.56360186 0.68884672 0.75 0.68884672
0.80833333 0.74166667 0.68826048 0.76594169]
mean value: 0.6763795258534475
key: train_mcc
value: [0.8612933 0.86052165 0.83971646 0.85382934 0.85314992 0.86794223
0.84766497 0.84023701 0.85436741 0.84697783]
mean value: 0.8525700111060143
key: test_accuracy
value: [0.8125 0.71875 0.78125 0.84375 0.875 0.84375
0.90322581 0.87096774 0.83870968 0.87096774]
mean value: 0.8358870967741936
key: train_accuracy
value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859
0.92280702 0.91929825 0.92631579 0.92280702]
mean value: 0.925460835186558
key: test_fscore
value: [0.8 0.70967742 0.78787879 0.84848485 0.875 0.84848485
0.90322581 0.86666667 0.85714286 0.88888889]
mean value: 0.838545012335335
key: train_fscore
value: [0.93197279 0.93150685 0.92150171 0.92832765 0.92783505 0.93515358
0.92567568 0.9220339 0.92832765 0.92465753]
mean value: 0.9276992378409221
key: test_precision
value: [0.85714286 0.73333333 0.76470588 0.82352941 0.875 0.82352941
0.875 0.86666667 0.78947368 0.8 ]
mean value: 0.8208381247235736
key: train_precision
value: [0.90131579 0.90666667 0.89403974 0.90066225 0.90604027 0.90728477
0.89542484 0.89473684 0.90066225 0.9 ]
mean value: 0.9006833409925814
key: test_recall
value: [0.75 0.6875 0.8125 0.875 0.875 0.875
0.93333333 0.86666667 0.9375 1. ]
mean value: 0.86125
key: train_recall
value: [0.96478873 0.95774648 0.95070423 0.95774648 0.95070423 0.96478873
0.95804196 0.95104895 0.95774648 0.95070423]
mean value: 0.9564020486555698
key: test_roc_auc
value: [0.8125 0.71875 0.78125 0.84375 0.875 0.84375
0.90416667 0.87083333 0.83541667 0.86666667]
mean value: 0.8352083333333333
key: train_roc_auc
value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859
0.92268295 0.91918645 0.92642569 0.92290456]
mean value: 0.9254579927115139
key: test_jcc
value: [0.66666667 0.55 0.65 0.73684211 0.77777778 0.73684211
0.82352941 0.76470588 0.75 0.8 ]
mean value: 0.7256363949088407
key: train_jcc
value: [0.87261146 0.87179487 0.85443038 0.86624204 0.86538462 0.87820513
0.86163522 0.85534591 0.86624204 0.85987261]
mean value: 0.8651764280073164
MCC on Blind test: 0.18
Accuracy on Blind test: 0.54
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.16305089 0.16052961 0.15664053 0.15771556 0.15495181 0.15255976
0.15471911 0.15581322 0.15795827 0.15890527]
mean value: 0.15728440284729003
key: score_time
value: [0.00907922 0.00902605 0.00912547 0.0093677 0.00861001 0.00851989
0.00923562 0.00842047 0.00907159 0.00920391]
mean value: 0.00896599292755127
key: test_mcc
value: [0.81409158 0.875 0.875 1. 1. 1.
0.9375 1. 1. 0.9372467 ]
mean value: 0.9438838276217104
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.9375 0.9375 1. 1. 1.
0.96774194 1. 1. 0.96774194]
mean value: 0.9716733870967742
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.9375 0.9375 1. 1. 1.
0.96774194 1. 1. 0.96969697]
mean value: 0.972152981427175
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.9375 0.9375 1. 1. 1.
0.9375 1. 1. 0.94117647]
mean value: 0.9636029411764706
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ]
mean value: 0.98125
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.9375 0.9375 1. 1. 1.
0.96875 1. 1. 0.96666667]
mean value: 0.9716666666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.88235294 0.88235294 1. 1. 1.
0.9375 1. 1. 0.94117647]
mean value: 0.9476715686274509
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.19
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01113367 0.01251006 0.01261091 0.01772285 0.01263809 0.01343751
0.0127852 0.01266503 0.01297426 0.01285744]
mean value: 0.013133502006530762
key: score_time
value: [0.01069093 0.01078391 0.0107305 0.01083326 0.01099324 0.01084566
0.01136661 0.01079345 0.01084757 0.01162291]
mean value: 0.010950803756713867
key: test_mcc
value: [0.68884672 0.59215653 0.81409158 0.56360186 0.77459667 0.75
0.74896053 0.54812195 0.53006813 0.82078268]
mean value: 0.6831226650318738
key: train_mcc
value: [0.8145351 0.86223926 0.87332606 0.85924016 0.86725157 0.87541287
0.7742616 0.84773912 0.81144956 0.88848951]
mean value: 0.8473944811490282
key: test_accuracy
value: [0.84375 0.78125 0.90625 0.78125 0.875 0.875
0.87096774 0.77419355 0.74193548 0.90322581]
mean value: 0.8352822580645162
key: train_accuracy
value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972
0.88070175 0.92280702 0.90175439 0.94385965]
mean value: 0.9216024215468248
key: test_fscore
value: [0.83870968 0.81081081 0.90322581 0.78787879 0.88888889 0.875
0.875 0.75862069 0.69230769 0.91428571]
mean value: 0.8344728067698034
key: train_fscore
value: [0.89230769 0.92647059 0.93706294 0.92907801 0.9347079 0.93430657
0.89102564 0.92028986 0.89393939 0.94244604]
mean value: 0.9201634638116422
key: test_precision
value: [0.86666667 0.71428571 0.93333333 0.76470588 0.8 0.875
0.82352941 0.78571429 0.9 0.84210526]
mean value: 0.8305340557275542
key: train_precision
value: [0.98305085 0.96923077 0.93055556 0.93571429 0.91275168 0.96969697
0.82248521 0.95488722 0.96721311 0.96323529]
mean value: 0.9408820939525007
key: test_recall
value: [0.8125 0.9375 0.875 0.8125 1. 0.875
0.93333333 0.73333333 0.5625 1. ]
mean value: 0.8541666666666666
key: train_recall
value: [0.81690141 0.88732394 0.94366197 0.92253521 0.95774648 0.90140845
0.97202797 0.88811189 0.83098592 0.92253521]
mean value: 0.9043238451689156
key: test_roc_auc
value: [0.84375 0.78125 0.90625 0.78125 0.875 0.875
0.87291667 0.77291667 0.74791667 0.9 ]
mean value: 0.8356250000000001
key: train_roc_auc
value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972
0.88038018 0.92292918 0.90150694 0.94378509]
mean value: 0.9215502807052103
key: test_jcc
value: [0.72222222 0.68181818 0.82352941 0.65 0.8 0.77777778
0.77777778 0.61111111 0.52941176 0.84210526]
mean value: 0.7215753510335554
key: train_jcc
value: [0.80555556 0.8630137 0.88157895 0.86754967 0.87741935 0.87671233
0.80346821 0.85234899 0.80821918 0.89115646]
mean value: 0.8527022396082421
MCC on Blind test: 0.2
Accuracy on Blind test: 0.58
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01892829 0.02864075 0.02735972 0.02840042 0.0284605 0.02859259
0.02847409 0.02875304 0.02716875 0.01797819]
mean value: 0.026275634765625
key: score_time
value: [0.01286125 0.01076007 0.01073647 0.01065683 0.01062083 0.01068902
0.01063824 0.03288555 0.0109849 0.02092481]
mean value: 0.014175796508789062
key: test_mcc
value: [0.68884672 0.75 0.81409158 0.93933644 0.8819171 1.
0.87083333 0.9372467 0.87083333 0.9372467 ]
mean value: 0.8690351901199767
key: train_mcc
value: [0.92994649 0.90955652 0.90901439 0.89492115 0.91585639 0.90955652
0.93741093 0.90253931 0.90988464 0.90897898]
mean value: 0.9127665317222325
key: test_accuracy
value: [0.84375 0.875 0.90625 0.96875 0.9375 1.
0.93548387 0.96774194 0.93548387 0.96774194]
mean value: 0.9337701612903225
key: train_accuracy
value: [0.96478873 0.95422535 0.95422535 0.9471831 0.95774648 0.95422535
0.96842105 0.95087719 0.95438596 0.95438596]
mean value: 0.9560464541635779
key: test_fscore
value: [0.84848485 0.875 0.90322581 0.96969697 0.94117647 1.
0.93333333 0.96551724 0.9375 0.96969697]
mean value: 0.934363163963128
key: train_fscore
value: [0.96527778 0.95532646 0.9550173 0.94809689 0.95833333 0.95532646
0.96907216 0.95205479 0.95532646 0.95470383]
mean value: 0.9568535471627235
key: test_precision
value: [0.82352941 0.875 0.93333333 0.94117647 0.88888889 1.
0.93333333 1. 0.9375 0.94117647]
mean value: 0.9273937908496732
key: train_precision
value: [0.95205479 0.93288591 0.93877551 0.93197279 0.94520548 0.93288591
0.9527027 0.93288591 0.93288591 0.94482759]
mean value: 0.9397082486363004
key: test_recall
value: [0.875 0.875 0.875 1. 1. 1.
0.93333333 0.93333333 0.9375 1. ]
mean value: 0.9429166666666666
key: train_recall
value: [0.97887324 0.97887324 0.97183099 0.96478873 0.97183099 0.97887324
0.98601399 0.97202797 0.97887324 0.96478873]
mean value: 0.9746774352408155
key: test_roc_auc
value: [0.84375 0.875 0.90625 0.96875 0.9375 1.
0.93541667 0.96666667 0.93541667 0.96666667]
mean value: 0.9335416666666667
key: train_roc_auc
value: [0.96478873 0.95422535 0.95422535 0.9471831 0.95774648 0.95422535
0.96835911 0.95080272 0.95447158 0.95442234]
mean value: 0.9560450113267015
key: test_jcc
value: [0.73684211 0.77777778 0.82352941 0.94117647 0.88888889 1.
0.875 0.93333333 0.88235294 0.94117647]
mean value: 0.8800077399380805
key: train_jcc
value: [0.93288591 0.91447368 0.91390728 0.90131579 0.92 0.91447368
0.94 0.90849673 0.91447368 0.91333333]
mean value: 0.9173360098273221
MCC on Blind test: 0.22
Accuracy on Blind test: 0.49
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.16565156 0.08813143 0.17730904 0.20824528 0.18379951 0.1740315
0.17967129 0.17941952 0.18022633 0.19038606]
mean value: 0.17268714904785157
key: score_time
value: [0.0107646 0.01237965 0.01942182 0.01081586 0.01998901 0.01066494
0.01124215 0.01264286 0.01966715 0.02074218]
mean value: 0.014833021163940429
key: test_mcc
value: [0.81409158 0.75 0.81409158 1. 0.8819171 1.
0.87083333 1. 1. 0.87770745]
mean value: 0.900864104531543
key: train_mcc
value: [0.95129413 0.94450549 0.94403659 0.93720088 0.94403659 0.93720088
0.93741093 0.93741093 0.93130575 0.95146839]
mean value: 0.9415870567033926
key: test_accuracy
value: [0.90625 0.875 0.90625 1. 0.9375 1.
0.93548387 1. 1. 0.93548387]
mean value: 0.9495967741935484
key: train_accuracy
value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986
0.96842105 0.96842105 0.96491228 0.9754386 ]
mean value: 0.9704657771188535
key: test_fscore
value: [0.90909091 0.875 0.90322581 1. 0.94117647 1.
0.93333333 1. 1. 0.94117647]
mean value: 0.9503002990052326
key: train_fscore
value: [0.97577855 0.97241379 0.97222222 0.96885813 0.97222222 0.96885813
0.96907216 0.96907216 0.96575342 0.97577855]
mean value: 0.9710029348503718
key: test_precision
value: [0.88235294 0.875 0.93333333 1. 0.88888889 1.
0.93333333 1. 1. 0.88888889]
mean value: 0.9401797385620915
key: train_precision
value: [0.95918367 0.9527027 0.95890411 0.95238095 0.95890411 0.95238095
0.9527027 0.9527027 0.94 0.95918367]
mean value: 0.953904557898687
key: test_recall
value: [0.9375 0.875 0.875 1. 1. 1.
0.93333333 1. 1. 1. ]
mean value: 0.9620833333333333
key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549
0.98601399 0.98601399 0.99295775 0.99295775]
mean value: 0.9887520929774452
key: test_roc_auc
value: [0.90625 0.875 0.90625 1. 0.9375 1.
0.93541667 1. 1. 0.93333333]
mean value: 0.949375
key: train_roc_auc
value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986
0.96835911 0.96835911 0.96501034 0.97549985]
mean value: 0.9704693194129814
key: test_jcc
value: [0.83333333 0.77777778 0.82352941 1. 0.88888889 1.
0.875 1. 1. 0.88888889]
mean value: 0.9087418300653595
key: train_jcc
value: [0.9527027 0.94630872 0.94594595 0.93959732 0.94594595 0.93959732
0.94 0.94 0.93377483 0.9527027 ]
mean value: 0.9436575487439082
MCC on Blind test: 0.2
Accuracy on Blind test: 0.43
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02582741 0.02434468 0.02545023 0.02682304 0.02640653 0.02720022
0.02612591 0.02765298 0.02734327 0.04823351]
mean value: 0.028540778160095214
key: score_time
value: [0.01104569 0.01089406 0.01136374 0.01076293 0.01096463 0.01084781
0.01096702 0.01096272 0.01116037 0.01098251]
mean value: 0.010995149612426758
key: test_mcc
value: [0.81325006 0.87831007 0.80813523 0.78446454 0.77459667 0.83914639
0.80813523 0.90748521 0.73763441 0.77382584]
mean value: 0.8124983647487063
key: train_mcc
value: [0.83119879 0.83472681 0.83507281 0.87790234 0.85985131 0.84227171
0.84207536 0.83472681 0.85645761 0.83886705]
mean value: 0.8453150611021845
key: test_accuracy
value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484
0.90322581 0.9516129 0.86885246 0.8852459 ]
mean value: 0.9044420941300899
key: train_accuracy
value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331
0.92086331 0.91726619 0.92818671 0.91921005]
mean value: 0.9225094610128773
key: test_fscore
value: [0.90909091 0.93939394 0.90625 0.87719298 0.88888889 0.92063492
0.90625 0.95384615 0.86666667 0.89230769]
mean value: 0.9060522153285311
key: train_fscore
value: [0.91651865 0.91814947 0.91872792 0.93950178 0.93048128 0.92226148
0.92198582 0.91814947 0.92882562 0.92035398]
mean value: 0.9234955465227851
key: test_precision
value: [0.85714286 0.88571429 0.87878788 0.96153846 0.875 0.90625
0.87878788 0.91176471 0.86666667 0.85294118]
mean value: 0.887459391099097
key: train_precision
value: [0.90526316 0.9084507 0.90277778 0.92957746 0.92226148 0.90625
0.90909091 0.9084507 0.92226148 0.90592334]
mean value: 0.9120307031148476
key: test_recall
value: [0.96774194 1. 0.93548387 0.80645161 0.90322581 0.93548387
0.93548387 1. 0.86666667 0.93548387]
mean value: 0.9286021505376344
key: train_recall
value: [0.92805755 0.92805755 0.9352518 0.94964029 0.93884892 0.93884892
0.9352518 0.92805755 0.93548387 0.9352518 ]
mean value: 0.9352750058018101
key: test_roc_auc
value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484
0.90322581 0.9516129 0.8688172 0.8844086 ]
mean value: 0.9043548387096774
key: train_roc_auc
value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331
0.92086331 0.91726619 0.92817359 0.9192388 ]
mean value: 0.922511023439313
key: test_jcc
value: [0.83333333 0.88571429 0.82857143 0.78125 0.8 0.85294118
0.82857143 0.91176471 0.76470588 0.80555556]
mean value: 0.8292407796451914
key: train_jcc
value: [0.84590164 0.84868421 0.8496732 0.88590604 0.87 0.8557377
0.85526316 0.84868421 0.86710963 0.85245902]
mean value: 0.8579418817037436
MCC on Blind test: 0.21
Accuracy on Blind test: 0.53
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.77656746 0.70605206 0.85919523 0.69673634 0.68120766 0.78336358
0.78208661 0.70059681 0.83346748 0.76766825]
mean value: 0.7586941480636596
key: score_time
value: [0.01191044 0.01261759 0.01475716 0.01256537 0.0127914 0.01143336
0.01280951 0.01418829 0.01239324 0.01240849]
mean value: 0.012787485122680664
key: test_mcc
value: [0.90369611 0.93743687 0.90369611 0.82199494 0.84266484 0.93743687
0.90369611 0.87278605 0.87055472 0.96770777]
mean value: 0.8961670394093372
key: train_mcc
value: [0.97124816 0.96043787 0.96402878 0.94966486 0.9497386 0.96402878
0.94986154 0.96405373 0.97487139 0.96768995]
mean value: 0.9615623654982854
key: test_accuracy
value: [0.9516129 0.96774194 0.9516129 0.90322581 0.91935484 0.96774194
0.9516129 0.93548387 0.93442623 0.98360656]
mean value: 0.9466419883659439
key: train_accuracy
value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439
0.97482014 0.98201439 0.98743268 0.98384201]
mean value: 0.9807605621068675
key: test_fscore
value: [0.95081967 0.96666667 0.95081967 0.89285714 0.92307692 0.96875
0.95238095 0.93333333 0.93103448 0.98412698]
mean value: 0.9453865829462917
key: train_fscore
value: [0.98566308 0.98018018 0.98201439 0.97491039 0.975 0.98201439
0.97508897 0.98207885 0.98747764 0.98378378]
mean value: 0.9808211677303444
key: test_precision
value: [0.96666667 1. 0.96666667 1. 0.88235294 0.93939394
0.9375 0.96551724 0.96428571 0.96875 ]
mean value: 0.9591133169568768
key: train_precision
value: [0.98214286 0.98194946 0.98201439 0.97142857 0.96808511 0.98201439
0.96478873 0.97857143 0.98571429 0.98555957]
mean value: 0.9782268783883663
key: test_recall
value: [0.93548387 0.93548387 0.93548387 0.80645161 0.96774194 1.
0.96774194 0.90322581 0.9 1. ]
mean value: 0.9351612903225807
key: train_recall
value: [0.98920863 0.97841727 0.98201439 0.97841727 0.98201439 0.98201439
0.98561151 0.98561151 0.98924731 0.98201439]
mean value: 0.9834571052835152
key: test_roc_auc
value: [0.9516129 0.96774194 0.9516129 0.90322581 0.91935484 0.96774194
0.9516129 0.93548387 0.93387097 0.98333333]
mean value: 0.9465591397849463
key: train_roc_auc
value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439
0.97482014 0.98201439 0.98742941 0.98383874]
mean value: 0.9807599082024703
key: test_jcc
value: [0.90625 0.93548387 0.90625 0.80645161 0.85714286 0.93939394
0.90909091 0.875 0.87096774 0.96875 ]
mean value: 0.8974780931434158
key: train_jcc
value: [0.97173145 0.96113074 0.96466431 0.95104895 0.95121951 0.96466431
0.95138889 0.96478873 0.97526502 0.96808511]
mean value: 0.9623987021299
MCC on Blind test: 0.14
Accuracy on Blind test: 0.35
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01096439 0.01291323 0.00820684 0.00848889 0.00764108 0.0079782
0.00760031 0.00791764 0.00761533 0.00789833]
mean value: 0.008722424507141113
key: score_time
value: [0.01091075 0.00868392 0.00824642 0.00876021 0.00801086 0.00799608
0.00807261 0.00803852 0.00840473 0.00831747]
mean value: 0.008544158935546876
key: test_mcc
value: [0.67883359 0.64549722 0.7130241 0.52981294 0.74193548 0.7130241
0.80813523 0.81325006 0.50860215 0.77072165]
mean value: 0.6922836529141403
key: train_mcc
value: [0.71239616 0.71972253 0.72313855 0.6419512 0.73033396 0.70874774
0.69849277 0.6908084 0.72712387 0.72023891]
mean value: 0.7072954079422489
key: test_accuracy
value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871
0.90322581 0.90322581 0.75409836 0.8852459 ]
mean value: 0.8445795875198308
key: train_accuracy
value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655
0.84892086 0.84532374 0.86355476 0.85996409]
mean value: 0.8534669930124124
key: test_fscore
value: [0.84375 0.82539683 0.86153846 0.72727273 0.87096774 0.86153846
0.90625 0.90909091 0.75409836 0.88888889]
mean value: 0.8448792376317495
key: train_fscore
value: [0.85765125 0.86170213 0.8627451 0.81343284 0.86631016 0.85561497
0.85211268 0.84697509 0.86428571 0.86170213]
mean value: 0.8542532047730724
key: test_precision
value: [0.81818182 0.8125 0.82352941 0.83333333 0.87096774 0.82352941
0.87878788 0.85714286 0.74193548 0.875 ]
mean value: 0.833490793678175
key: train_precision
value: [0.84859155 0.84965035 0.85512367 0.84496124 0.85865724 0.84805654
0.83448276 0.83802817 0.86120996 0.84965035]
mean value: 0.8488411836784526
key: test_recall
value: [0.87096774 0.83870968 0.90322581 0.64516129 0.87096774 0.90322581
0.93548387 0.96774194 0.76666667 0.90322581]
mean value: 0.8605376344086022
key: train_recall
value: [0.86690647 0.87410072 0.8705036 0.78417266 0.87410072 0.86330935
0.8705036 0.85611511 0.86738351 0.87410072]
mean value: 0.8601196462185091
key: test_roc_auc
value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871
0.90322581 0.90322581 0.75430108 0.88494624]
mean value: 0.8445698924731183
key: train_roc_auc
value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655
0.84892086 0.84532374 0.86354787 0.85998943]
mean value: 0.8534688378329595
key: test_jcc
value: [0.72972973 0.7027027 0.75675676 0.57142857 0.77142857 0.75675676
0.82857143 0.83333333 0.60526316 0.8 ]
mean value: 0.7355971008602588
key: train_jcc
value: [0.75077882 0.75700935 0.75862069 0.68553459 0.76415094 0.74766355
0.74233129 0.7345679 0.76100629 0.75700935]
mean value: 0.7458672762322701
MCC on Blind test: 0.21
Accuracy on Blind test: 0.57
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00830388 0.00819135 0.00802517 0.00794339 0.00828314 0.00875449
0.00812817 0.0083952 0.0085578 0.00874829]
mean value: 0.008333086967468262
key: score_time
value: [0.00846505 0.00840521 0.008214 0.00820637 0.00816274 0.00835466
0.00821352 0.00871086 0.0093646 0.00880384]
mean value: 0.00849008560180664
key: test_mcc
value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475
0.58834841 0.7130241 0.58264312 0.54086022]
mean value: 0.6099942679846233
key: train_mcc
value: [0.62249953 0.6079176 0.63414469 0.60794907 0.59713776 0.61543051
0.64482423 0.62249953 0.6375268 0.6122178 ]
mean value: 0.620214750789007
key: test_accuracy
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
0.79032258 0.85483871 0.78688525 0.7704918 ]
mean value: 0.8025118984664199
key: train_accuracy
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
0.82194245 0.81115108 0.81867145 0.80610413]
mean value: 0.8099595727367837
key: test_fscore
value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875 0.79365079
0.80597015 0.86153846 0.8 0.77419355]
mean value: 0.8018661210989436
key: train_fscore
value: [0.80874317 0.8036036 0.82167832 0.80500894 0.79928315 0.80438757
0.82661996 0.81349911 0.82123894 0.80505415]
mean value: 0.8109116928454192
key: test_precision
value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125
0.75 0.82352941 0.74285714 0.77419355]
mean value: 0.8047613833070375
key: train_precision
value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387
0.80546075 0.80350877 0.81118881 0.80797101]
mean value: 0.8066675601234072
key: test_recall
value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161
0.87096774 0.90322581 0.86666667 0.77419355]
mean value: 0.8060215053763441
key: train_recall
value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691
0.84892086 0.82374101 0.83154122 0.80215827]
mean value: 0.8155282225832238
key: test_roc_auc
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
0.79032258 0.85483871 0.78817204 0.77043011]
mean value: 0.8026344086021505
key: train_roc_auc
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
0.82194245 0.81115108 0.81864831 0.80609706]
mean value: 0.8099565508883215
key: test_jcc
value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474
0.675 0.75675676 0.66666667 0.63157895]
mean value: 0.6711319601335082
key: train_jcc
value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287
0.70447761 0.68562874 0.6966967 0.67371601]
mean value: 0.6820541480667476
MCC on Blind test: 0.18
Accuracy on Blind test: 0.52
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00809216 0.00795221 0.00791621 0.00790453 0.00721383 0.00737166
0.00789046 0.00771284 0.00766468 0.00788069]
mean value: 0.0077599287033081055
key: score_time
value: [0.01314855 0.01184964 0.0114398 0.0146842 0.01099205 0.01099849
0.01176476 0.0116837 0.01157475 0.01158309]
mean value: 0.01197190284729004
key: test_mcc
value: [0.45760432 0.48488114 0.67883359 0.55301004 0.67883359 0.67883359
0.54953196 0.74348441 0.40967742 0.70780713]
mean value: 0.5942497191157756
key: train_mcc
value: [0.7125253 0.73779681 0.71605437 0.74499483 0.7125253 0.71313508
0.726788 0.73745301 0.75237261 0.72554668]
mean value: 0.7279191995608599
key: test_accuracy
value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968
0.77419355 0.87096774 0.70491803 0.85245902]
mean value: 0.7960602855631941
key: train_accuracy
value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511
0.86330935 0.86870504 0.87612208 0.86175943]
mean value: 0.8637162083618563
key: test_fscore
value: [0.70175439 0.75 0.83333333 0.75862069 0.83333333 0.84375
0.78125 0.86666667 0.7 0.86153846]
mean value: 0.7930246870491879
key: train_fscore
value: [0.8540146 0.86654479 0.856102 0.8702011 0.8540146 0.85239852
0.86181818 0.86799277 0.87522604 0.85607477]
mean value: 0.8614387366046266
key: test_precision
value: [0.76923077 0.72727273 0.86206897 0.81481481 0.86206897 0.81818182
0.75757576 0.89655172 0.7 0.82352941]
mean value: 0.8031294954013006
key: train_precision
value: [0.86666667 0.88104089 0.86715867 0.88475836 0.86666667 0.875
0.87132353 0.87272727 0.88321168 0.89105058]
mean value: 0.8759604326054368
key: test_recall
value: [0.64516129 0.77419355 0.80645161 0.70967742 0.80645161 0.87096774
0.80645161 0.83870968 0.7 0.90322581]
mean value: 0.7861290322580645
key: train_recall
value: [0.84172662 0.85251799 0.84532374 0.85611511 0.84172662 0.83093525
0.85251799 0.86330935 0.86738351 0.82374101]
mean value: 0.8475297181609551
key: test_roc_auc
value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968
0.77419355 0.87096774 0.70483871 0.8516129 ]
mean value: 0.7959677419354838
key: train_roc_auc
value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511
0.86330935 0.86870504 0.8761378 0.86169129]
mean value: 0.8637109667105026
key: test_jcc
value: [0.54054054 0.6 0.71428571 0.61111111 0.71428571 0.72972973
0.64102564 0.76470588 0.53846154 0.75675676]
mean value: 0.6610902628549687
key: train_jcc
value: [0.74522293 0.76451613 0.74840764 0.77022654 0.74522293 0.74276527
0.7571885 0.76677316 0.77813505 0.74836601]
mean value: 0.7566824165390956
MCC on Blind test: 0.16
Accuracy on Blind test: 0.57
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01524782 0.01526904 0.01670051 0.01508474 0.01485252 0.01501393
0.01506996 0.01522017 0.01478338 0.01487541]
mean value: 0.015211749076843261
key: score_time
value: [0.00945497 0.00925422 0.00928378 0.00928211 0.00977159 0.00912595
0.00927424 0.00921845 0.00913382 0.00917101]
mean value: 0.009297013282775879
key: test_mcc
value: [0.64820372 0.75623534 0.80813523 0.71004695 0.74819006 0.7284928
0.7190925 0.70116959 0.61256703 0.6844511 ]
mean value: 0.7116584311085777
key: train_mcc
value: [0.78485761 0.79151169 0.79209132 0.85451608 0.77632088 0.78285538
0.75529076 0.75529076 0.78851732 0.80529218]
mean value: 0.7886543984062245
key: test_accuracy
value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871
0.85483871 0.83870968 0.80327869 0.83606557]
mean value: 0.8510312004230566
key: train_accuracy
value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921
0.87410072 0.87410072 0.89228007 0.90125673]
mean value: 0.8919436084884337
key: test_fscore
value: [0.83076923 0.88235294 0.90625 0.85245902 0.87878788 0.86956522
0.86567164 0.85714286 0.8125 0.85294118]
mean value: 0.8608439959922817
key: train_fscore
value: [0.8957265 0.89879931 0.8991453 0.92869565 0.89189189 0.89491525
0.88215488 0.88215488 0.89761092 0.90500864]
mean value: 0.8976103228458596
key: test_precision
value: [0.79411765 0.81081081 0.87878788 0.86666667 0.82857143 0.78947368
0.80555556 0.76923077 0.76470588 0.78378378]
mean value: 0.8091704107029185
key: train_precision
value: [0.8534202 0.85901639 0.85667752 0.8989899 0.84076433 0.84615385
0.82911392 0.82911392 0.85667752 0.87043189]
mean value: 0.8540359455885207
key: test_recall
value: [0.87096774 0.96774194 0.93548387 0.83870968 0.93548387 0.96774194
0.93548387 0.96774194 0.86666667 0.93548387]
mean value: 0.9221505376344086
key: train_recall
value: [0.94244604 0.94244604 0.94604317 0.96043165 0.94964029 0.94964029
0.94244604 0.94244604 0.94265233 0.94244604]
mean value: 0.9460637941259895
key: test_roc_auc
value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871
0.85483871 0.83870968 0.80430108 0.8344086 ]
mean value: 0.8509677419354839
key: train_roc_auc
value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921
0.87410072 0.87410072 0.89218947 0.90133055]
mean value: 0.8919419303267063
key: test_jcc
value: [0.71052632 0.78947368 0.82857143 0.74285714 0.78378378 0.76923077
0.76315789 0.75 0.68421053 0.74358974]
mean value: 0.7565401289085499
key: train_jcc
value: [0.81114551 0.81619938 0.81677019 0.86688312 0.80487805 0.80981595
0.78915663 0.78915663 0.81424149 0.82649842]
mean value: 0.8144745352495302
MCC on Blind test: 0.26
Accuracy on Blind test: 0.5
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.6481297 1.49946976 1.67077136 1.65997696 1.63008213 1.52724123
1.67578554 1.65596056 1.49459696 1.68944907]
mean value: 1.6151463270187378
key: score_time
value: [0.01430917 0.01388526 0.01319432 0.01351166 0.01167202 0.01358342
0.01357841 0.01354527 0.01401711 0.01371384]
mean value: 0.01350104808807373
key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.7190925 0.90369611 0.93743687
1. 1. 0.83655914 1. ]
mean value: 0.9268760160039228
key: train_mcc
value: [0.99280576 0.99283145 0.99640932 1. 0.99283145 0.99283145
0.99283145 0.99283145 0.99284434 0.98923428]
mean value: 0.9935450945650737
key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129 0.96774194
1. 1. 0.91803279 1. ]
mean value: 0.9627710206240084
key: train_accuracy
value: [0.99640288 0.99640288 0.99820144 1. 0.99640288 0.99640288
0.99640288 0.99640288 0.99640934 0.994614 ]
mean value: 0.9967642044353745
key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.84210526 0.95081967 0.96875
1. 1. 0.91803279 1. ]
mean value: 0.9614662772412258
key: train_fscore
value: [0.99640288 0.99638989 0.9981982 1. 0.99638989 0.99638989
0.99638989 0.99638989 0.99640288 0.99459459]
mean value: 0.9967548006672231
key: test_precision
value: [1. 1. 0.96774194 0.92307692 0.96666667 0.93939394
1. 1. 0.90322581 1. ]
mean value: 0.9700105271073013
key: train_precision
value: [0.99640288 1. 1. 1. 1. 1.
1. 1. 1. 0.99638989]
mean value: 0.9992792769394593
key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.77419355 0.93548387 1.
1. 1. 0.93333333 1. ]
mean value: 0.9546236559139785
key: train_recall
value: [0.99640288 0.99280576 0.99640288 1. 0.99280576 0.99280576
0.99280576 0.99280576 0.99283154 0.99280576]
mean value: 0.9942471828988423
key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129 0.96774194
1. 1. 0.91827957 1. ]
mean value: 0.9627956989247312
key: train_roc_auc
value: [0.99640288 0.99640288 0.99820144 1. 0.99640288 0.99640288
0.99640288 0.99640288 0.99641577 0.99461076]
mean value: 0.9967645238647792
key: test_jcc
value: [0.96774194 0.96774194 0.9375 0.72727273 0.90625 0.93939394
1. 1. 0.84848485 1. ]
mean value: 0.9294385386119257
key: train_jcc
value: [0.99283154 0.99280576 0.99640288 1. 0.99280576 0.99280576
0.99280576 0.99280576 0.99283154 0.98924731]
mean value: 0.9935342048941492
MCC on Blind test: 0.09
Accuracy on Blind test: 0.24
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01340103 0.01229239 0.00977087 0.00982738 0.00973988 0.01050377
0.00967884 0.01013613 0.01014376 0.01027131]
mean value: 0.010576534271240234
key: score_time
value: [0.01074123 0.00902033 0.00799775 0.00793815 0.00800681 0.00789976
0.00837636 0.00792694 0.00824547 0.00833321]
mean value: 0.008448600769042969
key: test_mcc
value: [0.90748521 0.96824584 0.96824584 1. 0.93743687 0.93548387
0.93743687 0.93743687 0.9344086 0.96774194]
mean value: 0.9493921894362165
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9516129 0.98387097 0.98387097 1. 0.96774194 0.96774194
0.96774194 0.96774194 0.96721311 0.98360656]
mean value: 0.9741142252776309
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94915254 0.98360656 0.98412698 1. 0.96666667 0.96774194
0.96666667 0.96666667 0.96666667 0.98360656]
mean value: 0.9734901243404501
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96875 1. 1. 0.96774194
1. 1. 0.96666667 1. ]
mean value: 0.9903158602150538
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90322581 0.96774194 1. 1. 0.93548387 0.96774194
0.93548387 0.93548387 0.96666667 0.96774194]
mean value: 0.9579569892473119
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9516129 0.98387097 0.98387097 1. 0.96774194 0.96774194
0.96774194 0.96774194 0.9672043 0.98387097]
mean value: 0.9741397849462365
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90322581 0.96774194 0.96875 1. 0.93548387 0.9375
0.93548387 0.93548387 0.93548387 0.96774194]
mean value: 0.9486895161290323
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.01
Accuracy on Blind test: 0.2
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10707998 0.10903525 0.10817385 0.10511184 0.10628986 0.10499215
0.10362315 0.10446763 0.10430741 0.10113478]
mean value: 0.10542159080505371
key: score_time
value: [0.01860476 0.01862955 0.01860476 0.01870513 0.01833129 0.01816988
0.01843429 0.01767302 0.01715016 0.01741219]
mean value: 0.01817150115966797
key: test_mcc
value: [0.93548387 1. 0.93548387 0.87831007 0.90369611 0.93743687
1. 0.96824584 0.90215054 0.93635873]
mean value: 0.9397165895399419
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96774194 1. 0.96774194 0.93548387 0.9516129 0.96774194
1. 0.98387097 0.95081967 0.96721311]
mean value: 0.9692226335272343
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96774194 1. 0.96774194 0.93103448 0.95081967 0.96875
1. 0.98412698 0.95081967 0.96875 ]
mean value: 0.9689784682115642
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96774194 1. 0.96774194 1. 0.96666667 0.93939394
1. 0.96875 0.93548387 0.93939394]
mean value: 0.9685172287390029
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 1. 0.96774194 0.87096774 0.93548387 1.
1. 1. 0.96666667 1. ]
mean value: 0.9708602150537634
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96774194 1. 0.96774194 0.93548387 0.9516129 0.96774194
1. 0.98387097 0.95107527 0.96666667]
mean value: 0.9691935483870968
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9375 1. 0.9375 0.87096774 0.90625 0.93939394
1. 0.96875 0.90625 0.93939394]
mean value: 0.9406005620723363
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.36
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00863886 0.00797391 0.0083859 0.00775075 0.00766373 0.0079093
0.00830865 0.00843334 0.00793123 0.00765133]
mean value: 0.008064699172973634
key: score_time
value: [0.00806904 0.00858569 0.00859904 0.00799298 0.00799918 0.00797725
0.00856709 0.00818801 0.00789118 0.00795794]
mean value: 0.008182740211486817
key: test_mcc
value: [0.75623534 0.87831007 0.87278605 0.83914639 0.84266484 0.64820372
0.74348441 0.90748521 0.77072165 0.83655914]
mean value: 0.8095596827565272
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065
0.87096774 0.9516129 0.8852459 0.91803279]
mean value: 0.9029085140137494
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.93103448 0.93333333 0.92063492 0.91525424 0.81355932
0.86666667 0.94915254 0.88135593 0.91803279]
mean value: 0.8986167081319949
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96 1. 0.96551724 0.90625 0.96428571 0.85714286
0.89655172 1. 0.89655172 0.93333333]
mean value: 0.9379632594417078
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77419355 0.87096774 0.90322581 0.93548387 0.87096774 0.77419355
0.83870968 0.90322581 0.86666667 0.90322581]
mean value: 0.8640860215053763
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065
0.87096774 0.9516129 0.88494624 0.91827957]
mean value: 0.9029032258064517
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.87096774 0.875 0.85294118 0.84375 0.68571429
0.76470588 0.90322581 0.78787879 0.84848485]
mean value: 0.8182668529288548
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.26
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.34418821 1.34185529 1.3479538 1.36781883 1.42743945 1.3655982
1.38340139 1.37809682 1.39602447 1.33490944]
mean value: 1.3687285900115966
key: score_time
value: [0.09742594 0.09719825 0.09524751 0.09951448 0.09094286 0.0994525
0.09763288 0.09727025 0.09892535 0.09526753]
mean value: 0.09688775539398194
key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584
1. 1. 0.90215054 1. ]
mean value: 0.9678863591361422
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
1. 1. 0.95081967 1. ]
mean value: 0.9837916446324696
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698
1. 1. 0.95081967 1. ]
mean value: 0.9837635248000134
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96774194 1. 0.96875 0.96875
1. 1. 0.93548387 1. ]
mean value: 0.9840725806451613
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1.
1. 1. 0.96666667 1. ]
mean value: 0.983763440860215
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
1. 1. 0.95107527 1. ]
mean value: 0.9838172043010753
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96774194 0.96774194 0.9375 0.96774194 0.96875 0.96875
1. 1. 0.90625 1. ]
mean value: 0.9684475806451613
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.19
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.99160314 0.89481354 0.90568662 0.9475925 0.90248585 0.94201088
0.93975306 0.95513034 0.88768649 0.92357564]
mean value: 0.9290338039398194
key: score_time
value: [0.15050101 0.24627447 0.24356008 0.27248359 0.27095199 0.25157189
0.20301151 0.27629042 0.26423383 0.23688626]
mean value: 0.24157650470733644
key: test_mcc
value: [0.93548387 0.96824584 0.93548387 0.96824584 0.90748521 0.96824584
1. 0.96824584 0.87082935 0.96770777]
mean value: 0.9489973426546622
key: train_mcc
value: [0.96425338 0.96058703 0.96425338 0.96058703 0.96412858 0.97132357
0.95353974 0.96412858 0.96783888 0.96065866]
mean value: 0.9631298857914714
key: test_accuracy
value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129 0.98387097
1. 0.98387097 0.93442623 0.98360656]
mean value: 0.9740613432046537
key: train_accuracy
value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151
0.97661871 0.98201439 0.98384201 0.98025135]
mean value: 0.9814812781731527
key: test_fscore
value: [0.96774194 0.98360656 0.96774194 0.98360656 0.95384615 0.98412698
1. 0.98360656 0.93548387 0.98412698]
mean value: 0.9743887536166753
key: train_fscore
value: [0.98220641 0.98039216 0.98220641 0.98039216 0.98214286 0.98571429
0.97690941 0.98214286 0.98401421 0.98039216]
mean value: 0.9816512905421962
key: test_precision
value: [0.96774194 1. 0.96774194 1. 0.91176471 0.96875
1. 1. 0.90625 0.96875 ]
mean value: 0.9690998576850095
key: train_precision
value: [0.97183099 0.97173145 0.97183099 0.97173145 0.9751773 0.9787234
0.96491228 0.9751773 0.97535211 0.97173145]
mean value: 0.9728198725682946
key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1.
1. 0.96774194 0.96666667 1. ]
mean value: 0.9805376344086022
key: train_recall
value: [0.99280576 0.98920863 0.99280576 0.98920863 0.98920863 0.99280576
0.98920863 0.98920863 0.99283154 0.98920863]
mean value: 0.9906500605966839
key: test_roc_auc
value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129 0.98387097
1. 0.98387097 0.93494624 0.98333333]
mean value: 0.9740860215053764
key: train_roc_auc
value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151
0.97661871 0.98201439 0.98382584 0.9802674 ]
mean value: 0.9814812665996235
key: test_jcc
value: [0.9375 0.96774194 0.9375 0.96774194 0.91176471 0.96875
1. 0.96774194 0.87878788 0.96875 ]
mean value: 0.9506278391121845
key: train_jcc
value: [0.96503497 0.96153846 0.96503497 0.96153846 0.96491228 0.97183099
0.95486111 0.96491228 0.96853147 0.96153846]
mean value: 0.9639733441646896
MCC on Blind test: 0.1
Accuracy on Blind test: 0.23
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01971221 0.00761104 0.00768089 0.00756931 0.00756836 0.00765538
0.00759244 0.00763845 0.00757504 0.00766015]
mean value: 0.008826327323913575
key: score_time
value: [0.01263118 0.00788474 0.00787878 0.00782609 0.00785947 0.00789833
0.00783944 0.00784731 0.00786543 0.00787163]
mean value: 0.008340239524841309
key: test_mcc
value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475
0.58834841 0.7130241 0.58264312 0.54086022]
mean value: 0.6099942679846233
key: train_mcc
value: [0.62249953 0.6079176 0.63414469 0.60794907 0.59713776 0.61543051
0.64482423 0.62249953 0.6375268 0.6122178 ]
mean value: 0.620214750789007
key: test_accuracy
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
0.79032258 0.85483871 0.78688525 0.7704918 ]
mean value: 0.8025118984664199
key: train_accuracy
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
0.82194245 0.81115108 0.81867145 0.80610413]
mean value: 0.8099595727367837
key: test_fscore
value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875 0.79365079
0.80597015 0.86153846 0.8 0.77419355]
mean value: 0.8018661210989436
key: train_fscore
value: [0.80874317 0.8036036 0.82167832 0.80500894 0.79928315 0.80438757
0.82661996 0.81349911 0.82123894 0.80505415]
mean value: 0.8109116928454192
key: test_precision
value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125
0.75 0.82352941 0.74285714 0.77419355]
mean value: 0.8047613833070375
key: train_precision
value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387
0.80546075 0.80350877 0.81118881 0.80797101]
mean value: 0.8066675601234072
key: test_recall
value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161
0.87096774 0.90322581 0.86666667 0.77419355]
mean value: 0.8060215053763441
key: train_recall
value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691
0.84892086 0.82374101 0.83154122 0.80215827]
mean value: 0.8155282225832238
key: test_roc_auc
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
0.79032258 0.85483871 0.78817204 0.77043011]
mean value: 0.8026344086021505
key: train_roc_auc
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
0.82194245 0.81115108 0.81864831 0.80609706]
mean value: 0.8099565508883215
key: test_jcc
value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474
0.675 0.75675676 0.66666667 0.63157895]
mean value: 0.6711319601335082
key: train_jcc
value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287
0.70447761 0.68562874 0.6966967 0.67371601]
mean value: 0.6820541480667476
MCC on Blind test: 0.18
Accuracy on Blind test: 0.52
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.21862555 0.04956889 0.04996634 0.05186462 0.05506182 0.06219912
0.06107974 0.06241131 0.05737829 0.05969238]
mean value: 0.07278480529785156
key: score_time
value: [0.01031947 0.00971913 0.00969386 0.00995827 0.01020288 0.00984311
0.0096755 0.00973344 0.0099237 0.00953674]
mean value: 0.009860610961914063
key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584
0.96824584 0.96824584 0.90215054 1. ]
mean value: 0.9615355264465131
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
0.98387097 0.98387097 0.95081967 1. ]
mean value: 0.9805658381808567
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698
0.98360656 0.98360656 0.95081967 1. ]
mean value: 0.9804848362754233
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96774194 1. 0.96875 0.96875
1. 1. 0.93548387 1. ]
mean value: 0.9840725806451613
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1.
0.96774194 0.96774194 0.96666667 1. ]
mean value: 0.9773118279569892
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
0.98387097 0.98387097 0.95107527 1. ]
mean value: 0.9805913978494624
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96774194 0.96774194 0.9375 0.96774194 0.96875 0.96875
0.96774194 0.96774194 0.90625 1. ]
mean value: 0.9619959677419355
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.2
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01578832 0.04168701 0.05872059 0.01797581 0.01809096 0.03955126
0.04262829 0.01832151 0.01880884 0.0180583 ]
mean value: 0.028963088989257812
key: score_time
value: [0.01038313 0.01973009 0.01196766 0.01065159 0.01061916 0.02021313
0.02139711 0.01056767 0.01115489 0.01077628]
mean value: 0.013746070861816406
key: test_mcc
value: [0.93548387 1. 0.93548387 0.87831007 0.87831007 0.96824584
0.93743687 0.96824584 0.83655914 0.93635873]
mean value: 0.9274434285640426
key: train_mcc
value: [0.94283651 0.9393413 0.94305636 0.93563929 0.95353974 0.9393413
0.93914669 0.93214329 0.94994909 0.93925798]
mean value: 0.941425155755879
key: test_accuracy
value: [0.96774194 1. 0.96774194 0.93548387 0.93548387 0.98387097
0.96774194 0.98387097 0.91803279 0.96721311]
mean value: 0.9627181385510312
key: train_accuracy
value: [0.97122302 0.96942446 0.97122302 0.9676259 0.97661871 0.96942446
0.96942446 0.96582734 0.97486535 0.96947935]
mean value: 0.9705136070676672
key: test_fscore
value: [0.96774194 1. 0.96774194 0.93103448 0.93939394 0.98412698
0.96875 0.98360656 0.91803279 0.96875 ]
mean value: 0.9629178621509581
key: train_fscore
value: [0.97163121 0.9699115 0.97173145 0.96808511 0.97690941 0.9699115
0.96980462 0.96637168 0.9751773 0.96980462]
mean value: 0.9709338406138824
key: test_precision
value: [0.96774194 1. 0.96774194 1. 0.88571429 0.96875
0.93939394 1. 0.90322581 0.93939394]
mean value: 0.9571961841921519
key: train_precision
value: [0.95804196 0.95470383 0.95486111 0.95454545 0.96491228 0.95470383
0.95789474 0.95121951 0.96491228 0.95789474]
mean value: 0.9573689736486591
key: test_recall
value: [0.96774194 1. 0.96774194 0.87096774 1. 1.
1. 0.96774194 0.93333333 1. ]
mean value: 0.970752688172043
key: train_recall
value: [0.98561151 0.98561151 0.98920863 0.98201439 0.98920863 0.98561151
0.98201439 0.98201439 0.98566308 0.98201439]
mean value: 0.9848972434955261
key: test_roc_auc
value: [0.96774194 1. 0.96774194 0.93548387 0.93548387 0.98387097
0.96774194 0.98387097 0.91827957 0.96666667]
mean value: 0.9626881720430107
key: train_roc_auc
value: [0.97122302 0.96942446 0.97122302 0.9676259 0.97661871 0.96942446
0.96942446 0.96582734 0.97484593 0.96950182]
mean value: 0.970513911451484
key: test_jcc
value: [0.9375 1. 0.9375 0.87096774 0.88571429 0.96875
0.93939394 0.96774194 0.84848485 0.93939394]
mean value: 0.9295446690406368
key: train_jcc
value: [0.94482759 0.94158076 0.94501718 0.93814433 0.95486111 0.94158076
0.94137931 0.93493151 0.95155709 0.94137931]
mean value: 0.9435258942337567
MCC on Blind test: 0.14
Accuracy on Blind test: 0.35
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01899743 0.00779724 0.00781965 0.00752091 0.00765944 0.00745153
0.00752687 0.00762939 0.00754023 0.00756288]
mean value: 0.008750557899475098
key: score_time
value: [0.008394 0.00820088 0.00782681 0.00797677 0.00793958 0.00783062
0.00781059 0.0078342 0.00790739 0.00793242]
mean value: 0.007965326309204102
key: test_mcc
value: [0.61807005 0.74819006 0.67883359 0.64549722 0.67883359 0.63439154
0.63439154 0.67419986 0.54654832 0.64708149]
mean value: 0.6506037256013296
key: train_mcc
value: [0.66814183 0.65361701 0.66955589 0.67282515 0.64923736 0.67144111
0.67540424 0.6622781 0.67590132 0.66881107]
mean value: 0.6667213081476084
key: test_accuracy
value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161
0.80645161 0.82258065 0.7704918 0.81967213]
mean value: 0.8203067160232681
key: train_accuracy
value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381
0.83453237 0.82733813 0.83482944 0.83123878]
mean value: 0.8299161747801042
key: test_fscore
value: [0.81818182 0.87878788 0.84375 0.82539683 0.84375 0.82857143
0.82857143 0.84507042 0.78125 0.8358209 ]
mean value: 0.8329150697566978
key: train_fscore
value: [0.84175084 0.83501684 0.84280936 0.84422111 0.83388704 0.84317032
0.84511785 0.83946488 0.84563758 0.84175084]
mean value: 0.8412826664142349
key: test_precision
value: [0.77142857 0.82857143 0.81818182 0.8125 0.81818182 0.74358974
0.74358974 0.75 0.73529412 0.77777778]
mean value: 0.779911501896796
key: train_precision
value: [0.79113924 0.78481013 0.7875 0.78996865 0.77469136 0.79365079
0.7943038 0.784375 0.79495268 0.79113924]
mean value: 0.7886530890164406
key: test_recall
value: [0.87096774 0.93548387 0.87096774 0.83870968 0.87096774 0.93548387
0.93548387 0.96774194 0.83333333 0.90322581]
mean value: 0.896236559139785
key: train_recall
value: [0.89928058 0.89208633 0.90647482 0.90647482 0.9028777 0.89928058
0.9028777 0.9028777 0.90322581 0.89928058]
mean value: 0.9014736597818519
key: test_roc_auc
value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161
0.80645161 0.82258065 0.77150538 0.81827957]
mean value: 0.8202688172043011
key: train_roc_auc
value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381
0.83453237 0.82733813 0.83470643 0.83136072]
mean value: 0.8299160671462831
key: test_jcc
value: [0.69230769 0.78378378 0.72972973 0.7027027 0.72972973 0.70731707
0.70731707 0.73170732 0.64102564 0.71794872]
mean value: 0.7143569460642631
key: train_jcc
value: [0.72674419 0.71676301 0.7283237 0.73043478 0.71509972 0.72886297
0.73177843 0.72334294 0.73255814 0.72674419]
mean value: 0.7260652053436807
MCC on Blind test: 0.21
Accuracy on Blind test: 0.5
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01070261 0.0129571 0.01364112 0.01318789 0.01269341 0.01534224
0.01468229 0.01412392 0.01440811 0.0143919 ]
mean value: 0.013613057136535645
key: score_time
value: [0.008075 0.01009893 0.00991964 0.01034665 0.01041341 0.01067472
0.0105691 0.01087594 0.01076126 0.01034617]
mean value: 0.01020808219909668
key: test_mcc
value: [0.82199494 0.93743687 0.93548387 0.81325006 0.87831007 0.74161985
0.90748521 0.83914639 0.72318666 0.30374645]
mean value: 0.7901660359762814
key: train_mcc
value: [0.87166214 0.92172241 0.94266562 0.92172241 0.91860435 0.69376766
0.94305636 0.93238486 0.88634645 0.2887174 ]
mean value: 0.8320649673376139
key: test_accuracy
value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871
0.9516129 0.91935484 0.85245902 0.59016393]
mean value: 0.8845848757271285
key: train_accuracy
value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813
0.97122302 0.96582734 0.94075404 0.57630162]
mean value: 0.9065616806375366
key: test_fscore
value: [0.89285714 0.96875 0.96774194 0.89655172 0.93939394 0.87323944
0.95384615 0.92063492 0.83018868 0.71264368]
mean value: 0.895584761037988
key: train_fscore
value: [0.92979127 0.96126761 0.97153025 0.96126761 0.95971979 0.85185185
0.97173145 0.9664903 0.93761815 0.7020202 ]
mean value: 0.9213288471474509
key: test_precision
value: [1. 0.93939394 0.96774194 0.96296296 0.88571429 0.775
0.91176471 0.90625 0.95652174 0.55357143]
mean value: 0.8858920997139276
key: train_precision
value: [0.98393574 0.94137931 0.96126761 0.94137931 0.93515358 0.74594595
0.95486111 0.94809689 0.992 0.54085603]
mean value: 0.8944875526911704
key: test_recall
value: [0.80645161 1. 0.96774194 0.83870968 1. 1.
1. 0.93548387 0.73333333 1. ]
mean value: 0.9281720430107527
key: train_recall
value: [0.88129496 0.98201439 0.98201439 0.98201439 0.98561151 0.99280576
0.98920863 0.98561151 0.88888889 1. ]
mean value: 0.9669464428457234
key: test_roc_auc
value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871
0.9516129 0.91935484 0.85053763 0.58333333]
mean value: 0.8837096774193549
key: train_roc_auc
value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813
0.97122302 0.96582734 0.94084732 0.57706093]
mean value: 0.9066469405121065
key: test_jcc
value: [0.80645161 0.93939394 0.9375 0.8125 0.88571429 0.775
0.91176471 0.85294118 0.70967742 0.55357143]
mean value: 0.818451456829066
key: train_jcc
value: [0.86879433 0.92542373 0.94463668 0.92542373 0.92255892 0.74193548
0.94501718 0.93515358 0.88256228 0.54085603]
mean value: 0.8632361942955643
MCC on Blind test: 0.1
Accuracy on Blind test: 0.29
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01686311 0.01279736 0.01273036 0.01324439 0.01300955 0.01324821
0.01373839 0.01282573 0.01237702 0.01400542]
mean value: 0.013483953475952149
key: score_time
value: [0.01079369 0.01044965 0.01073813 0.0103972 0.01031804 0.01030827
0.01035023 0.01034307 0.01035166 0.01030016]
mean value: 0.010435009002685547
key: test_mcc
value: [0.87831007 0.74161985 0.78446454 0.71567809 0.79471941 0.93548387
0.96824584 0.84983659 0.77072165 0.90586325]
mean value: 0.8344943153997917
key: train_mcc
value: [0.92518498 0.76865678 0.81406658 0.92923662 0.90265061 0.89965316
0.92844206 0.89154571 0.92828039 0.93998809]
mean value: 0.8927704971400476
key: test_accuracy
value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194
0.98387097 0.91935484 0.8852459 0.95081967]
mean value: 0.9110259122157589
key: train_accuracy
value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029
0.96402878 0.9442446 0.96409336 0.96947935]
mean value: 0.9438968394404763
key: test_fscore
value: [0.93103448 0.83018868 0.89552239 0.80769231 0.89855072 0.96774194
0.98360656 0.9122807 0.88135593 0.95384615]
mean value: 0.9061819863058443
key: train_fscore
value: [0.96146789 0.85420945 0.90819672 0.96309963 0.95172414 0.94890511
0.96350365 0.94183865 0.96441281 0.97012302]
mean value: 0.9427481068247102
key: test_precision
value: [1. 1. 0.83333333 1. 0.81578947 0.96774194
1. 1. 0.89655172 0.91176471]
mean value: 0.9425181172521699
key: train_precision
value: [0.98127341 0.99521531 0.83433735 0.98863636 0.91390728 0.96296296
0.97777778 0.98431373 0.95759717 0.94845361]
mean value: 0.9544474964669887
key: test_recall
value: [0.87096774 0.70967742 0.96774194 0.67741935 1. 0.96774194
0.96774194 0.83870968 0.86666667 1. ]
mean value: 0.8866666666666667
key: train_recall
value: [0.94244604 0.74820144 0.99640288 0.93884892 0.99280576 0.9352518
0.94964029 0.9028777 0.97132616 0.99280576]
mean value: 0.937060674041412
key: test_roc_auc
value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194
0.98387097 0.91935484 0.88494624 0.95 ]
mean value: 0.9109139784946236
key: train_roc_auc
value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029
0.96402878 0.9442446 0.96408035 0.96952116]
mean value: 0.9438997189345298
key: test_jcc
value: [0.87096774 0.70967742 0.81081081 0.67741935 0.81578947 0.9375
0.96774194 0.83870968 0.78787879 0.91176471]
mean value: 0.832825990728842
key: train_jcc
value: [0.92579505 0.74551971 0.83183183 0.92882562 0.90789474 0.90277778
0.92957746 0.89007092 0.93127148 0.94197952]
mean value: 0.8935544122114777
MCC on Blind test: 0.1
Accuracy on Blind test: 0.4
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10854602 0.09391761 0.09340096 0.09336042 0.09349442 0.0939045
0.09685636 0.09437943 0.09400725 0.09450531]
mean value: 0.09563722610473632
key: score_time
value: [0.01416063 0.01400757 0.01419139 0.0142355 0.01414442 0.01419091
0.01533508 0.01431847 0.01418138 0.0142591 ]
mean value: 0.014302444458007813
key: test_mcc
value: [0.96824584 1. 0.96824584 0.96824584 0.96824584 0.96824584
1. 0.96824584 0.90215054 1. ]
mean value: 0.9711625556945535
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 1. 0.98387097 0.98387097 0.98387097 0.98387097
1. 0.98387097 0.95081967 1. ]
mean value: 0.985404547858276
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98360656 1. 0.98412698 0.98360656 0.98412698 0.98412698
1. 0.98360656 0.95081967 1. ]
mean value: 0.9854020296643248
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96875 1. 0.96875 0.96875
1. 1. 0.93548387 1. ]
mean value: 0.9841733870967742
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 1. 1. 0.96774194 1. 1.
1. 0.96774194 0.96666667 1. ]
mean value: 0.986989247311828
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 1. 0.98387097 0.98387097 0.98387097 0.98387097
1. 0.98387097 0.95107527 1. ]
mean value: 0.9854301075268818
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96774194 1. 0.96875 0.96774194 0.96875 0.96875
1. 0.96774194 0.90625 1. ]
mean value: 0.9715725806451613
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.21
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03534484 0.0434587 0.05311441 0.03592443 0.03713274 0.05070066
0.05334473 0.05310988 0.04448533 0.03317356]
mean value: 0.04397892951965332
key: score_time
value: [0.022789 0.0229876 0.02233076 0.01710248 0.01946139 0.03598452
0.02479911 0.02968454 0.01835775 0.03061008]
mean value: 0.024410724639892578
key: test_mcc
value: [0.93743687 0.93743687 0.93548387 0.93743687 0.93548387 0.96824584
0.96824584 0.87831007 0.90215054 0.96774194]
mean value: 0.9367972553494428
key: train_mcc
value: [1. 0.99640932 0.99640932 0.99640932 1. 1.
0.99283145 1. 0.99641572 0.99641572]
mean value: 0.9974890870152905
key: test_accuracy
value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097
0.98387097 0.93548387 0.95081967 0.98360656]
mean value: 0.9676361713379165
key: train_accuracy
value: [1. 0.99820144 0.99820144 0.99820144 1. 1.
0.99640288 1. 0.99820467 0.99820467]
mean value: 0.9987416529971714
key: test_fscore
value: [0.96666667 0.96666667 0.96774194 0.96666667 0.96774194 0.98412698
0.98360656 0.93103448 0.95081967 0.98360656]
mean value: 0.9668678124738592
key: train_fscore
value: [1. 0.9981982 0.9981982 0.9981982 1. 1.
0.99638989 1. 0.99821109 0.9981982 ]
mean value: 0.9987393775723891
key: test_precision
value: [1. 1. 0.96774194 1. 0.96774194 0.96875
1. 1. 0.93548387 1. ]
mean value: 0.9839717741935484
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99642857 1. ]
mean value: 0.9996428571428572
key: test_recall
value: [0.93548387 0.93548387 0.96774194 0.93548387 0.96774194 1.
0.96774194 0.87096774 0.96666667 0.96774194]
mean value: 0.951505376344086
key: train_recall
value: [1. 0.99640288 0.99640288 0.99640288 1. 1.
0.99280576 1. 1. 0.99640288]
mean value: 0.9978417266187051
key: test_roc_auc
value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097
0.98387097 0.93548387 0.95107527 0.98387097]
mean value: 0.9676881720430108
key: train_roc_auc
value: [1. 0.99820144 0.99820144 0.99820144 1. 1.
0.99640288 1. 0.99820144 0.99820144]
mean value: 0.9987410071942446
key: test_jcc
value: [0.93548387 0.93548387 0.9375 0.93548387 0.9375 0.96875
0.96774194 0.87096774 0.90625 0.96774194]
mean value: 0.9362903225806452
key: train_jcc
value: [1. 0.99640288 0.99640288 0.99640288 1. 1.
0.99280576 1. 0.99642857 0.99640288]
mean value: 0.9974845837615622
MCC on Blind test: 0.06
Accuracy on Blind test: 0.21
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.18122554 0.21741056 0.19656324 0.19563127 0.21628571 0.1628089
0.20888186 0.19683623 0.13708878 0.13875246]
mean value: 0.18514845371246338
key: score_time
value: [0.02055907 0.02068186 0.02069783 0.02077031 0.02077198 0.01287293
0.02073574 0.02086687 0.01321125 0.02461028]
mean value: 0.019577813148498536
key: test_mcc
value: [0.67741935 0.74819006 0.74348441 0.69047575 0.80813523 0.87278605
0.81325006 0.81325006 0.54086022 0.74352218]
mean value: 0.7451373368256522
key: train_mcc
value: [0.87415162 0.87059372 0.86758591 0.89596753 0.88157448 0.87455914
0.87086426 0.86758591 0.87459701 0.88883589]
mean value: 0.8766315468831808
key: test_accuracy
value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387
0.90322581 0.90322581 0.7704918 0.86885246]
mean value: 0.870386039132734
key: train_accuracy
value: [0.93705036 0.9352518 0.93345324 0.94784173 0.94064748 0.93705036
0.9352518 0.93345324 0.93716338 0.9443447 ]
mean value: 0.9381508078994614
key: test_fscore
value: [0.83870968 0.87878788 0.875 0.82142857 0.9 0.9375
0.90909091 0.90909091 0.76666667 0.87878788]
mean value: 0.8715062491272169
key: train_fscore
value: [0.93738819 0.93571429 0.93474427 0.94849023 0.94138544 0.9380531
0.93617021 0.93474427 0.9380531 0.94474153]
mean value: 0.9389484621579285
key: test_precision
value: [0.83870968 0.82857143 0.84848485 0.92 0.93103448 0.90909091
0.85714286 0.85714286 0.76666667 0.82857143]
mean value: 0.8585415155848971
key: train_precision
value: [0.93238434 0.92907801 0.91695502 0.93684211 0.92982456 0.92334495
0.92307692 0.91695502 0.92657343 0.93639576]
mean value: 0.9271430114193007
key: test_recall
value: [0.83870968 0.93548387 0.90322581 0.74193548 0.87096774 0.96774194
0.96774194 0.96774194 0.76666667 0.93548387]
mean value: 0.8895698924731182
key: train_recall
value: [0.94244604 0.94244604 0.95323741 0.96043165 0.95323741 0.95323741
0.94964029 0.95323741 0.94982079 0.95323741]
mean value: 0.9510971867667156
key: test_roc_auc
value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387
0.90322581 0.90322581 0.77043011 0.86774194]
mean value: 0.870268817204301
key: train_roc_auc
value: [0.93705036 0.9352518 0.93345324 0.94784173 0.94064748 0.93705036
0.9352518 0.93345324 0.93714061 0.94436064]
mean value: 0.9381501250612413
key: test_jcc
value: [0.72222222 0.78378378 0.77777778 0.6969697 0.81818182 0.88235294
0.83333333 0.83333333 0.62162162 0.78378378]
mean value: 0.7753360312183841
key: train_jcc
value: [0.88215488 0.87919463 0.87748344 0.90202703 0.88926174 0.88333333
0.88 0.87748344 0.88333333 0.89527027]
mean value: 0.884954210937499
MCC on Blind test: 0.22
Accuracy on Blind test: 0.49
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.25665951 0.24086618 0.24189734 0.23880529 0.24180579 0.24213672
0.24336982 0.24555063 0.24932742 0.25003719]
mean value: 0.2450455904006958
key: score_time
value: [0.00856853 0.0083406 0.00863934 0.00827336 0.00876927 0.00846887
0.00852108 0.00857925 0.00858474 0.00857282]
mean value: 0.008531785011291504
key: test_mcc
value: [0.96824584 0.96824584 0.93548387 1. 1. 0.96824584
1. 0.96824584 0.9344086 1. ]
mean value: 0.9742875819325697
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 1. 1. 0.98387097
1. 0.98387097 0.96721311 1. ]
mean value: 0.9870438921205711
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98360656 0.98360656 0.96774194 1. 1. 0.98412698
1. 0.98360656 0.96666667 1. ]
mean value: 0.986935525840867
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96774194 1. 1. 0.96875
1. 1. 0.96666667 1. ]
mean value: 0.9903158602150538
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96774194 0.96774194 0.96774194 1. 1. 1.
1. 0.96774194 0.96666667 1. ]
mean value: 0.983763440860215
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 1. 1. 0.98387097
1. 0.98387097 0.9672043 1. ]
mean value: 0.9870430107526882
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96774194 0.96774194 0.9375 1. 1. 0.96875
1. 0.96774194 0.93548387 1. ]
mean value: 0.9744959677419355
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.19
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01201487 0.01361275 0.01402354 0.01380372 0.01379108 0.02837563
0.01576805 0.01669693 0.02459884 0.01624608]
mean value: 0.01689314842224121
key: score_time
value: [0.0111146 0.01098752 0.01094055 0.01091433 0.01093793 0.01111317
0.01172638 0.01128578 0.01110101 0.01107979]
mean value: 0.011120104789733886
key: test_mcc
value: [0.74193548 0.80813523 0.81325006 0.52297636 0.74819006 0.67419986
0.67883359 0.81325006 0.72516604 0.71375712]
mean value: 0.7239693864680706
key: train_mcc
value: [0.82567165 0.81659431 0.79995316 0.7380124 0.83549358 0.78285538
0.76623167 0.78683637 0.87297353 0.8490525 ]
mean value: 0.8073674571945186
key: test_accuracy
value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065
0.83870968 0.90322581 0.85245902 0.85245902]
mean value: 0.8575885774722369
key: train_accuracy
value: [0.9118705 0.90827338 0.89748201 0.85971223 0.91546763 0.88848921
0.88309353 0.89028777 0.93536804 0.92280072]
mean value: 0.9012845020213631
key: test_fscore
value: [0.87096774 0.9 0.89655172 0.73684211 0.87878788 0.84507042
0.83333333 0.89655172 0.86567164 0.86567164]
mean value: 0.8589448213713017
key: train_fscore
value: [0.90875233 0.90876565 0.89142857 0.84210526 0.91965812 0.89491525
0.88245931 0.88291747 0.93771626 0.92598967]
mean value: 0.8994707904383525
key: test_precision
value: [0.87096774 0.93103448 0.96296296 0.80769231 0.82857143 0.75
0.86206897 0.96296296 0.78378378 0.80555556]
mean value: 0.8565600191740348
key: train_precision
value: [0.94208494 0.90391459 0.94736842 0.96296296 0.8762215 0.84615385
0.88727273 0.94650206 0.90635452 0.88778878]
mean value: 0.9106624340187
key: test_recall
value: [0.87096774 0.87096774 0.83870968 0.67741935 0.93548387 0.96774194
0.80645161 0.83870968 0.96666667 0.93548387]
mean value: 0.8708602150537634
key: train_recall
value: [0.87769784 0.91366906 0.84172662 0.74820144 0.9676259 0.94964029
0.87769784 0.82733813 0.97132616 0.9676259 ]
mean value: 0.8942549186457286
key: test_roc_auc
value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065
0.83870968 0.90322581 0.85430108 0.85107527]
mean value: 0.8576344086021506
key: train_roc_auc
value: [0.9118705 0.90827338 0.89748201 0.85971223 0.91546763 0.88848921
0.88309353 0.89028777 0.93530337 0.92288105]
mean value: 0.9012860679198577
key: test_jcc
value: [0.77142857 0.81818182 0.8125 0.58333333 0.78378378 0.73170732
0.71428571 0.8125 0.76315789 0.76315789]
mean value: 0.7554036327560076
key: train_jcc
value: [0.83276451 0.83278689 0.80412371 0.72727273 0.85126582 0.80981595
0.78964401 0.79037801 0.88273616 0.86217949]
mean value: 0.8182967266032459
MCC on Blind test: 0.15
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01394916 0.01445484 0.01928425 0.01152682 0.01178312 0.01156855
0.01153874 0.0114634 0.01149082 0.01142311]
mean value: 0.012848281860351562
key: score_time
value: [0.01326442 0.01076293 0.01061916 0.01054502 0.01053524 0.01052785
0.01054406 0.01065135 0.01066136 0.01065755]
mean value: 0.010876893997192383
key: test_mcc
value: [0.90369611 1. 0.90369611 0.87831007 0.80813523 0.93743687
0.90369611 1. 0.77072165 0.90586325]
mean value: 0.9011555410657976
key: train_mcc
value: [0.91741458 0.92145965 0.92518498 0.92475364 0.93914669 0.91054923
0.93214329 0.92475364 0.93206857 0.92840473]
mean value: 0.9255878992579923
key: test_accuracy
value: [0.9516129 1. 0.9516129 0.93548387 0.90322581 0.96774194
0.9516129 1. 0.8852459 0.95081967]
mean value: 0.9497355896351137
key: train_accuracy
value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597
0.96582734 0.96223022 0.96588869 0.96409336]
mean value: 0.9626025212146261
key: test_fscore
value: [0.95238095 1. 0.95238095 0.93103448 0.90625 0.96875
0.95238095 1. 0.88135593 0.95384615]
mean value: 0.9498379425951021
key: train_fscore
value: [0.95900178 0.96113074 0.96296296 0.96269982 0.96980462 0.95575221
0.96637168 0.96269982 0.96637168 0.96441281]
mean value: 0.9631208137030208
key: test_precision
value: [0.9375 1. 0.9375 1. 0.87878788 0.93939394
0.9375 1. 0.89655172 0.91176471]
mean value: 0.9438998248202102
key: train_precision
value: [0.95053004 0.94444444 0.94463668 0.95087719 0.95789474 0.94076655
0.95121951 0.95087719 0.95454545 0.95422535]
mean value: 0.9500017150163743
key: test_recall
value: [0.96774194 1. 0.96774194 0.87096774 0.93548387 1.
0.96774194 1. 0.86666667 1. ]
mean value: 0.9576344086021505
key: train_recall
value: [0.9676259 0.97841727 0.98201439 0.97482014 0.98201439 0.97122302
0.98201439 0.97482014 0.97849462 0.97482014]
mean value: 0.9766264407828575
key: test_roc_auc
value: [0.9516129 1. 0.9516129 0.93548387 0.90322581 0.96774194
0.9516129 1. 0.88494624 0.95 ]
mean value: 0.9496236559139786
key: train_roc_auc
value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597
0.96582734 0.96223022 0.96586602 0.96411258]
mean value: 0.9626021763234573
key: test_jcc
value: [0.90909091 1. 0.90909091 0.87096774 0.82857143 0.93939394
0.90909091 1. 0.78787879 0.91176471]
mean value: 0.906584933093472
key: train_jcc
value: [0.92123288 0.92517007 0.92857143 0.92808219 0.94137931 0.91525424
0.93493151 0.92808219 0.93493151 0.93127148]
mean value: 0.9288906795867435
MCC on Blind test: 0.19
Accuracy on Blind test: 0.44
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
'electro_rr', 'electro_mm', 'electro_sm', 'electr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.11430693 0.1973083 0.16238761 0.19749618 0.09576607 0.1363256
0.09693003 0.12462544 0.20535517 0.23405504]
mean value: 0.1564556360244751
key: score_time
value: [0.01904321 0.02068663 0.02037406 0.02086973 0.01105285 0.01114559
0.01939201 0.01616836 0.01520658 0.01612258]
mean value: 0.01700615882873535
key: test_mcc
value: [0.90369611 1. 0.93548387 0.87831007 0.84266484 0.93743687
0.93743687 0.96824584 0.77072165 0.93635873]
mean value: 0.9110354846088805
key: train_mcc
value: [0.92844206 0.93563929 0.9393413 0.93563929 0.94986154 0.9393413
0.93214329 0.92844206 0.94264494 0.93558747]
mean value: 0.9367082543906752
key: test_accuracy
value: [0.9516129 1. 0.96774194 0.93548387 0.91935484 0.96774194
0.96774194 0.98387097 0.8852459 0.96721311]
mean value: 0.9546007403490216
key: train_accuracy
value: [0.96402878 0.9676259 0.96942446 0.9676259 0.97482014 0.96942446
0.96582734 0.96402878 0.97127469 0.96768402]
mean value: 0.9681764462756545
key: test_fscore
value: [0.95238095 1. 0.96774194 0.93103448 0.92307692 0.96875
0.96875 0.98360656 0.88135593 0.96875 ]
mean value: 0.9545446783280807
key: train_fscore
value: [0.96453901 0.96808511 0.9699115 0.96808511 0.97508897 0.9699115
0.96637168 0.96453901 0.97153025 0.96797153]
mean value: 0.9686033664546803
key: test_precision
value: [0.9375 1. 0.96774194 1. 0.88235294 0.93939394
0.93939394 1. 0.89655172 0.93939394]
mean value: 0.950232841898009
key: train_precision
value: [0.95104895 0.95454545 0.95470383 0.95454545 0.96478873 0.95470383
0.95121951 0.95104895 0.96466431 0.95774648]
mean value: 0.9559015511110829
key: test_recall
value: [0.96774194 1. 0.96774194 0.87096774 0.96774194 1.
1. 0.96774194 0.86666667 1. ]
mean value: 0.9608602150537635
key: train_recall
value: [0.97841727 0.98201439 0.98561151 0.98201439 0.98561151 0.98561151
0.98201439 0.97841727 0.97849462 0.97841727]
mean value: 0.9816624120058792
key: test_roc_auc
value: [0.9516129 1. 0.96774194 0.93548387 0.91935484 0.96774194
0.96774194 0.98387097 0.88494624 0.96666667]
mean value: 0.9545161290322581
key: train_roc_auc
value: [0.96402878 0.9676259 0.96942446 0.9676259 0.97482014 0.96942446
0.96582734 0.96402878 0.9712617 0.96770326]
mean value: 0.9681770712462289
key: test_jcc
value: [0.90909091 1. 0.9375 0.87096774 0.85714286 0.93939394
0.93939394 0.96774194 0.78787879 0.93939394]
mean value: 0.9148504049713727
key: train_jcc
value: [0.93150685 0.93814433 0.94158076 0.93814433 0.95138889 0.94158076
0.93493151 0.93150685 0.94463668 0.93793103]
mean value: 0.9391351978873097
MCC on Blind test: 0.15
Accuracy on Blind test: 0.38